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Preface 


MATRIX is Australia’s international and residential mathematical research institute. 
It was established in 2015 and launched in 2016 as a joint partnership between 
Monash University and The University of Melbourne, with seed funding from the 
ARC Centre of Excellence for Mathematical and Statistical Frontiers. The purpose 
of MATRIX is to facilitate new collaborations and mathematical advances through 
intensive residential research programs, which are currently held in Creswick, a 
small town nestled in the beautiful forests of the Macedon Ranges, 130 km west of 
Melbourne. 
This book is a scientific record of the eight programs held at MATRIX in 2017: 


¢ Hypergeometric Motives and Calabi-Yau Differential Equations 

* Computational Inverse Problems 

¢ Integrability in Low-Dimensional Quantum Systems 

¢ Elliptic Partial Differential Equations of Second Order: Celebrating 40 Years of 
Gilbarg and Trudinger’s Book 

¢ Combinatorics, Statistical Mechanics, and Conformal Field Theory 

¢ Mathematics of Risk 

¢ Tutte Centenary Retreat 

* Geometric R-Matrices: from Geometry to Probability 


The MATRIX Scientific Committee selected these programs based on scientific 
excellence and the participation rate of high-profile international participants. This 
committee consists of: Jan de Gier (Melbourne University, Chair), Ben Andrews 
(Australian National University), Darren Crowdy (Imperial College London), Hans 
De Sterck (Monash University), Alison Etheridge (University of Oxford), Gary 
Froyland (University of New South Wales), Liza Levina (University of Michigan), 
Kerrie Mengersen (Queensland University of Technology), Arun Ram (University 
of Melbourne), Joshua Ross (University of Adelaide), Terence Tao (University of 
California, Los Angeles), Ole Warnaar (University of Queensland), and David Wood 
(Monash University). 

These programs involved organisers from a variety of Australian universities, 
including Australian National University, Monash University, Queensland Univer- 
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sity of Technology, University of Newcastle, University of Melbourne, University 
of Queensland, University of Sydney, University of Technology Sydney, and Uni- 
versity of Western Australia, along with international organisers and participants. 

Each program lasted 1-4 weeks, and included ample unstructured time to 
encourage collaborative research. Some of the longer programs had an embedded 
conference or lecture series. All participants were encouraged to submit articles to 
the MATRIX Annals. 

The articles were grouped into refereed contributions and other contributions. 
Refereed articles contain original results or reviews on a topic related to the 
MATRIX program. The other contributions are typically lecture notes or short 
articles based on talks or activities at MATRIX. A guest editor organised appropriate 
refereeing and ensured the scientific quality of submitted articles arising from each 
program. The Editors (Jan de Gier, Cheryl E. Praeger, Terence Tao and myself) 
finally evaluated and approved the papers. 


Many thanks to the authors and to the guest editors for their wonderful work. 


MATRIX is hosting eight programs in 2018, with more to come in 2019; see 
www.matrix-inst.org.au. Our goal is to facilitate collaboration between researchers 
in universities and industry, and increase the international impact of Australian 
research in the mathematical sciences. 


David R. Wood 
MATRIX Book Series Editor-in-Chief 


Hypergeometric Motives and Calabi-Yau 
Differential Equations 


8-27 January 2017 
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Masha Vlasenko 
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Polish Academy of Sciences 


Wadim Zudilin 
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The majority of the articles presented below are extended abstracts of the talks 
given by program participants at the workshop that took place from January 16 to 
20, 2017. Some of them present a new perspective or results that appeared due to 
collaboration following the activity in Creswick. 

The two main topics of the program, Calabi-Yau differential equations and 
hypergeometric motives, provide an explicit approach and experimental ground 
to such important themes in contemporary arithmetic geometry as the Langlands 
program, motives and mirror symmetry. Hypergeometric motives are families of 
motives whose periods are given by generalised hypergeometric functions. Their 
L-functions are expected to cover a wide range of known L-functions. Due to the 
recent work of researchers (many of whom were present in Creswick) it is now 
possible to compute L-functions of hypergeometric motives efficiently. Thus one 
can test the standard conjectures, e.g. on special values and modularity, for motives 
of any degree and weight. Many algorithms for computing with the hypergeometric 
motives are now implemented in the computer algebra system Magma. 

Local factors of hypergeometric L-functions can be investigated by the means of 
finite hypergeometric functions, another topic to which a few articles in this volume 
are devoted. The techniques developed by the authors allow to transport classical 
formulas to the finite field setting, count points on algebraic varieties over finite 
fields, study their congruence properties and Galois representations. Importantly, 
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finite hypergeometric functions can be viewed as periods of motives over finite 
fields. 

Periods over finite fields form a new angle of understanding the integrality phe- 
nomenon arising in mirror symmetry. Originally discovered by physicists in the mid 
1980s, mirror symmetry remains one of the central research themes binding string 
theory and algebraic geometry. Numerous examples show that the expression of the 
mirror map in so-called canonical coordinates possesses rich arithmetic properties. 
This expression involves particular solutions to a Picard—Fuchs differential equation 
of a family of Calabi-Yau manifolds near a singular point. Application of p-adic 
methods to the study of Calabi-Yau differential equations gives a very promising 
prospective, as it is announced in the final article by Duco van Straten. 

The three weeks at the MATRIX institute were intense and fruitful. To illustrate 
these words, there was a special lecture by Fernando Rodriguez Villegas scheduled 
at the very last moment on Thursday afternoon of the workshop week, in which 
he presented, jointly with David Roberts and Mark Watkins, a new conjecture on 
motivic supercongruences that was invented in Creswick. This talk influenced what 
happened in the last week of the workshop. David Broadhurst gave his two lectures 
on the very first and very last days of the program, reporting in the second talk on 
the tremendous progress achieved by him in collaboration with David Roberts over 
the three weeks. 

We are confident that ideas and projects that emerged during the program will 
drive our field of research in the coming years. 


Masha Vlasenko 
Guest Editor 
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Participants 


James Wan (SUTD, Singapore), Fang-Ting Tu (Louisiana State), Yifan Yang 
(National Chiao Tung University), Eric Delaygue (Institut Camille Jordan, Lyon), 
John Voight (Dartmouth), Adriana Salerno (Bates College), Alex Ghitza (Mel- 
bourne), Mark Watkins (Sydney), Piotr Achinger (THES) with Helena, Jan de Gier 
(Melbourne), David Broadhurst (Open University), Ole Warnaar (Queensland), 
Ravi Ramakrishna (Cornell), Fernando Rodriguez Villegas (ICTP, Trieste), Sharon 
Frechette (College of the Holy Cross), Robert Osburn (University College Dublin), 
Frits Beukers (Utrecht), Paul Norbury (Melbourne), David Roberts (Minnesota 
Morris), Duco van Straten (Johannes Gutenberg), Holly Swisher (Oregon State), 
Abdellah Sebbar (Ottawa) 


Computational Inverse Problems 


11-23 June 2017 
Organisers 
Tiangang Cui 
Monash Uni 


Hans De Sterck 
Monash Uni Forward: F(6)+¢ 


Markus Hegland 
Australian National Uni 


Youssef Marzouk 
Massachusetts Inst Tech 


Jan Turner 
Queensland Uni Tech 


Karen Willcox 
Massachusetts Inst Tech luverse: P(6lY) 


The integration of complex data sets into large-scale computational models is one 
of the central challenges of modern applied mathematics. This challenge is present 
in almost every application area within science and engineering, e.g., geosciences, 
biological systems, astrophysics, meteorology, aerospace, and subsurface flow. At 
the heart of this challenge often lies an inverse problem: we seek to convert indirect 
data into useful characterisations of the unknown model parameters including 
source terms, initial or boundary conditions, model structure, physical coefficients, 
etc. Solution of the inverse problem, along with model prediction and uncertainty 
assessment, can be cast in a Bayesian setting and thus naturally characterised by the 
posterior distribution over unknown parameters conditioned on the data. Unfortu- 
nately, solution of such statistical inverse problems for systems governed by large- 
scale, complex computational models has traditionally been intractable: models are 
complicated and computationally expensive to evaluate; available indirect data are 
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often limited, noisy, and subject to natural variation; inversion algorithms often scale 
poorly to high-dimensional, or in principle infinite- dimensional, model parameters. 

Our program contributed to the active international research effort in computa- 
tional mathematics to connect theoretical developments with algorithmic advance- 
ments, buoyed by a range of cutting-edge applications. The program attracted a total 
of 47 attendees from a diverse range of highly relevant fields. Our attendees include 
renowned researchers in numerical analysis, scientific computing, optimisation, 
and stochastic computation, as well as high profile domain experts working in 
meteorology, super-resolution imaging, aerospace, and subsurface. The program 
began with a week of mini-conference. Seven 45-min plenary presentations and 
twenty 30-min invited presentations were scheduled during the mini-conference. In 
the second week, we organised thirteen 45-min presentations in the mornings and 
reserved afternoons for collaboration. 

During the program, our attendees presented and extensively collaborated the 
following key topics in computational inverse problems: 


e Deterministic and statistical methods for inverse problems. 

e Advanced Markov chain Monte Carlo and quasi Monte Carlo methods. 

¢ Optimal transport theory and its current and potential applications in inverse 
problems. 

¢ Model reduction methods and multi-scale methods. 

¢ Scalable experimental design methods. 

¢ High performance numerical solvers, including multilevel methods. 

e Applications in geothermal engineering, additive manufacturing, aeronautics, 
remote sensing, and super-resolution imaging. 


The articles in this proceedings represent different aspects of the program. For 
example, Bardsley and Cui describe an optimisation-based methods for nonlinear 
hierarchical Bayesian inverse problem, Fox et al. presents a novel methods for 
sequential inverse problems using the Frobenius-Perron operator, MacNamara, 
McLean and Burrage present an adaptive contour integration methods for solving 
master equations, Guo, Loeper, and Wang present initial investigations of using opti- 
mal transport to solve inverse problems in finance, Harrach and Rieger present a set 
optimisation technique for reconstructing electrical impedance tomography image 
using single-measurement, Haario et al. investigates new ideas on characterising 
chaotic stochastic differential equations, Lamminpaa et al. presents a case study on 
the atmospheric remote sensing, Ye, Roosta-Khorasani, and Cui present an extensive 
survey on optimisation methods used in inverse problems. 

We would like to thank all of the authors who took the time to contribute to this 
volume. We would also like to thank the MATRIX staff and officials for hosting and 
facilitating this wonderful event and giving us the opportunity to share our work 
with this volume. 


Tiangang Cui and Hans De Sterck 
Guest Editors 
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Participants 


Bart van Bloemen Waanders, Benjamin Peherstorfer, Colin Fox, Gregoire Loeper, 
Habib N. Najm, Harriet LI, Heikki Haario, Janosch Rieger, Jinglai LI, John 
Bardsley, Josef Dick, Kate Lee, Kody Law, Lutz Gross, Marko Laine, Nan Ye, 
Oliver Maclaren, Olivier Zahm, Omar Ghattas, Qinian Jin, Tianhai Tan, Tim Garoni, 
Zheng Wang, Elizabeth Qian, Gianluca Detommaso, Ruanui Nicholson, Elvar 
Bjarkason, Fred Roosta, Shev MacNamara, Alessio Spantini, Amani Alahmadi, 
Hoang Viet Ha, Mia (Xin) Shu, Carl (Ao) Shu, Thien Binh Nguyen, Oliver Krzysik, 
Brad Marvin, Ellen B Le, Jesse Adams, Hongbo Xie, Hans Elmlund, Cyril Reboul 


Integrability in Low-Dimensional 
Quantum Systems 


26 June—21 July 2017 


Organisers 
Murray Batchelor 
Australian National Uni 
Patrick Dorey oe 
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Chaiho Rim 
Sogang, Seoul 


9,6 


Clare Dunning 
Uni Kent 


This MATRIX program focused on aspects of integrability in low-dimensional 
quantum systems and areas of application. It was organized around currently 
active hot topics and open problems. The emphasis was on focused research 
and interaction in small groups to achieve real collaboration. The research topics 
included: 


e AdS/CFT 

¢ Bethe ansatz and quantum spin chains 

¢ Bulk and boundary conformal and quantum field theory 
¢ Cold atoms, strongly correlated systems 

¢ Integrability in models of matter-light interaction 

¢ Logarithmic CFT 

¢ ODE/IM and its massive variants 

* Quantum quenches and quantum entanglement 

¢ Random matrix approach to CFT and integrability 


XV 
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Among the integrability community, this workshop was a major event on the 
international scene and enabled us to bring together scientists at the leading 
edge of research in integrable quantum systems in low-dimensions. Indeed, with 
59 participants over 3 weeks, a significant proportion of the active world-wide 
community working on quantum integrability was in attendance. 

Classical integrabilty of two-dimensional systems and the related quantum inte- 
grability of one-dimensional systems are finding areas of application in statistical 
physics, condensed matter physics and particle physics in addition to contributing to 
highly mathematical topics such as Yang-Baxter algebras, quantum groups, cluster 
algebras, affine Lie algebras and combinatorial representation theory. With a series 
of Introductory Lectures on Hot Topics and advanced seminars, this workshop 
offered extensive training to graduate students and Early Career Researchers 
working in integrability and related topics. 

Highlights, among many of the meeting, include the announcement of (1) the 
analytic calculation of the conformal partition functions of two-dimensional critical 
percolation, (2) the demonstration of the quantum toroidal integrability behind the 
AGT correspondence as well as (3) some striking progress on the mathematical 
description of fusion within the affine Temperly-Lieb algebra. Contributed articles 
included in these MATRIX Annals cover the topics of (1) form factors, (2) the 
combinatorics and generating functions of RNA structures, (3) supersymmetric 
quantum chains and (4) proofs of factorization and sum-to-1 properties of the AY 
face models. During the program there were also several groups of collaborators 
informally reporting rapid progress including (1) a collaboration explaining the 
mysteries of Baxter’s Q-matrix for s/(2) models at roots of unity and (2) a col- 
laboration deriving analytically the correlation functions and conformal weights of 
critical dense polymers. Many physical applications to quantum quenches, ultracold 
atoms and matter-light interaction were also showcased during the meeting. All of 
these represent significant advancement in our discipline. 

We gratefully acknowledge the generous support of our sponsors—MATRIX, the 
Australian Mathematical Sciences Institute (AMSJ), the Australian Mathematical 
Society (AustMS) and the Asia Pacific Center for Theoretical Physics (APCTP). 
We particularly thank Jan de Gier for his encouragement in bringing this program 
together. We also thank the very helpful MATRIX staff at Melbourne and Creswick 
campuses, as well as our outstanding chef Adam, for their many significant 
contributions to the success of this meeting. Lastly, we thank the authors who kindly 
took the time and made the effort to contribute to this volume. 


Chaiho Rim and Paul Pearce 
Guest Editors 
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Participants 


Changrim Ahn (Ewha, Seoul, Korea), Zoltan Bajnok (Wigner, Hungary), Jean- 
Emile Bourgine (KIAS, Seoul, Korea), Daniel Braak (Augsburg, Germany), Jun- 
peng Cao (CAS, Beijing, China), Sang Kwan Choi (Sichuan University, China), Ed 
Corrigan (York, UK), Gyérgy Fehér (Budapest, Hungary), Angela Foerster (Rio 
Grande do Sul, Brazil), Holger Frahm (Hannover, Germany), Azat Gainutdinov 
(Tours, France), Frank Giihmann (Wuppertal, Germany), Xiwen Guan (CAS, 
Wuhan, China), Jesper Jacobsen (ENS, Paris, France), Shashank Kanade (Alberta, 
Canada), Andreas Kliimpe (Wuppertal, Germany), Karol Kozlowski (Lyon, France), 
Atsuo Kuniba (Tokyo, Japan), Masahide Manabe (Warsaw, Poland), Chihiro Matsui 
(Tokyo, Japan), Yutaka Matsuo (Tokyo, Japan), Jianin Mei (Dalian, China), Alexi 
Morin-Duchesne (Louvain, Belgium), Rafael Nepomechie (Miami, USA), Ovidiu 
Patu (Bucharest, Romania), Francesco Ravanini (Bologna, Italy), Yvan Saint- 
Aubin (Montréal, Canada), Kareljan Schoutens (Amsterdam, Netherlands), Junji 
Suzuki (Shizuoka, Japan), Gabor Takacs (Budapest, Hungary), Masato Wakayama 
(Kyushu, Japan), Yupeng Wang (CAS, Beijing, China), Waltraut Wustmann (Mary- 
land, USA), Wen-Li Yang (Xian, China), Hong Zhang (ITP, Beijing, China), Huan- 
Xiang Zhou (Chongqing, China), Rui-Dong Zhu (Tokyo, Japan), Zeying Chen (Uni 
Melbourne), Jan de Gier (Uni Melbourne), Omar Foda (Uni Melbourne), Alexandr 
Garbali (Uni Melbourne), Phil Isaac (Uni Queensland), Kazuya Kawasetsu (Uni 
Melbourne), Sergii Koval (Australian National Uni), Jon Links (Uni Queensland), 
Tianshu Liu (Uni Melbourne), Vladimir Mangazeev (Australian National Uni), 
Thomas Quella (Uni Melbourne), Jorgen Rasmussen (Uni Queensland), David Rid- 
out (Uni Melbourne), Boris Runov (Australian National Uni), William Stewart (Uni 
Melbourne), Michael Wheeler (Uni Melbourne), Paul Zinn-Justin (Uni Melbourne) 


Elliptic Partial Differential Equations of 
Second Order: Celebrating 40 Years of 
Gilbarg and Trudinger’s Book 


16—28 October 2017 
Organisers 

Lucio Boccardo 
Sapienza Uni Roma 


Florica-Corina Cirstea 
Uni Sydney 


Julie Clutterbuck 
Monash Uni 


L. Craig Evans 
Uni California Berkeley 


Enrico Valdinoci 
Uni Melbourne 


SSS 


Paul Bryan 
Macquarie Uni 


Our program celebrated the 40th anniversary of the publication of Gilbarg and 
Trudinger’s highly influential “Elliptic Partial Differential Equations of Second 
Order”, one of the most highly cited texts in mathematics (over 10,000 citations). 
We sought to link past research with future perspectives, by discussing what the 
important developments in the area during these 40 years have been and what are 
the new trends of contemporary research. Particular attention was given to some 
of the topics in which the book served as a great source of inspiration, such as 
fully nonlinear PDEs, viscosity solutions, Hessian equations, optimal transport, 
stochastic point of view, geometric flows, and so on. 

The first week of the program consisted of a series of introductory lectures 
aimed at Ph.D. students and postdocs, featuring in particular lectures given by Neil 
Trudinger himself. Special thanks go to Connor Mooney who gave a beautiful series 
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of lectures with only 24h notice after a late cancellation by the original lecturer due 
to illness. The lectures were: 


¢ Estimates for fully nonlinear equations, Neil Trudinger 
¢ Mean Curvature Flow with free boundary, Valentina Wheeler 
¢ Optimal regularity in the Calculus of Variations, Connor Mooney 


The second week was devoted to research. During this week, three to four 
research lectures were held per day with the remainder of the time devoted 
to research collaboration and the general enjoyment of the beautiful Australian 
bushland surrounding the institute. Arising from the research workshop were several 
submissions included in these proceedings. The papers deal with topics such as 
variational problems, particularly non-linear geometric problems, optimal transport, 
regularity properties and spectral properties of elliptic operators. Neil Trudinger, 
one of the two authors for whom are program honoured is famous for his work on 
such problems. As such, each submission represents a continuation of the legacy of 
the book “Elliptic partial differential equations of second order” and its continuing 
influence on mathematics. 


¢ “Boundary regularity of mass-minimizing integral currents and a question of 
Almgren” by Camillo De Lellis, Guido De Philippis, Jonas Hirsch and Annalisa 
Massaccesi: This paper is an announcement of results to be published in detail in 
a forthcoming article. The results describe boundary regularity of area minimis- 
ing currents in high codimension ensuring that regular points are dense on the 
boundary and leading to a structure theorem answering in particular a question 
of Almgren, implying singular points on the boundary have low dimension, 
and yielding a monotonicity formula. The announced results, representing a 
continuation of Allard’s regularity theorem and the monumental “Big Regularity 
Paper” of Almgren were completed during the second week of our program. 

¢ “Optimal transport with discrete mean field interaction” by Jiakun Liu and 
Grégoire Loeper: This paper is also an announcement of ongoing work motivated 
by the motion of self-gravitating matter governed by the Euler-Poisson system. 
Here the authors build on the first author’s formulation of the problem as a 
variational problem which is then solved using optimal transport techniques 
exploiting Monge-Kantorovich duality. The result considers a time-discretisation 
of more general variational problems obtaining regularity results. 

e “A sixth order curvature flow of plane curves with boundary conditions” by 
James McCoy, Glen Wheeler and Yuhan Wu: This submission announces results 
for a high order curvature flow. Curvature flows, particularly those arising via 
variational principles have been extensively studied over the past 30 years. 
A prototypical example is the Mean Curvature Flow, a second order gradient 
flow, whilst the Wilmore flow is perhaps the most well known example of a 
higher gradient flow. The author’s describe a sixth order gradient flow with free 
boundary arising in elasticity theory. The maximum principle arguments that 
feature heavily in second order flows are of no utility in higher order flows and 
must be replaced by other techniques. The authors employ a Poincaré inequality 
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and interpolation inequalities to develop suitable integral estimates leading to the 
conclusion that the flow smoothly converges to the optimal configuration. 

e “Quasilinear parabolic and elliptic equations with singular potentials” by Maria 
Michaela Porzio: This paper considers quasilinear equations with singular, Hardy 
potentials. A well known result is that solutions to the heat equation with positive 
driving term and positive initial data do not exist for Hardy parameters larger 
than the optimal constant in Hardy’s inequality. This particular paper considers 
divergence form operators and in particular the asymptotic behaviour of solutions 
to such equations provided the Hardy parameter is sufficiently small. Unique 
global existence of solutions is obtained with decay estimates implying the 
asymptotic behaviour as f — oo is independent of the initial data and hence 
uniqueness of the associated elliptic problem is assured. 

¢ “How to hear the corners of a drum” by Medet Nursultanov, Julie Rowlett, and 
David Sher: This paper is a detailed announcement of ongoing work. A well 
known question asks if one can “hear the shape of a drum?”. The answer in 
general is no, there exists non-congruent domains for which the Laplacian has 
the same spectrum. The result of this paper says that smooth domains may 
be distinguished from domains with corners by the spectrum of the Laplacian. 
More precisely, the spectrum of the Laplacian on a smooth domain with either 
Neumann or Robin boundary conditions is never equal to that of the Laplacian 
on a domain with corners and either Neumann or Robin boundary conditions. 
The result hinges on a generalisation of Kac’s locality principle. 


Special thanks go to Julie Clutterbuck for her wonderful depiction of a well worn 


copy of Gilbarg’s and Trudinger’s book featured on our program materials. 


Paul Bryan 
Guest Editor 
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Participants 


Yann Bernard (Monash Uni), Norman Dancer (Uni Sydney), Camillo De Lel- 
lis (ETH Zurich), Guido de Philippis (SISSA Trieste), Serena Dipierro (Uni 
Melbourne), Yihong Du (Uni New England), Nicola Fusco (Uni Napoli), Jesse 
Gell-Redman (Uni Melbourne), Joseph Grotowski (Uni Queensland), Feida Jiang 
(Tsinghua Uni), Gary Lieberman (Iowa State Uni), Jiakun Liu (Uni Wollongong), 
Gregoire Loeper (Monash Uni), Connor Mooney (Uni Texas Austin), Aldo Pratelli 
(Erlangen-Niirnberg), Maria Michaela Porzio (Roma), Frédéric Robert (Uni Lor- 
raine), Julie Rowlett (Chalmers Uni), Mariel Saez (Pontificia Uni, Chile), Neil 
Trudinger (Australian National Uni), John Urbas (Australian National Uni), Jerome 
Vetois (McGill Uni), Xu-Jia Wang (Australian National Uni), Valentina Wheeler 
(Uni Wollongong), Bin Zhou (Australian National Uni), Graham Williams (Uni 
Wollongong), James McCoy (Uni Wollongong) 


Combinatorics, Statistical Mechanics, 
and Conformal Field Theory 


29 October—18 November 2017 
Organisers 

Vladimir Korepin 

Stony Brook Uni 


Vladimir Mangazeev 
Australian National Uni 


Bernard Nienhuis 
Uni Amsterdam 


Jorgen Rasmussen 
Uni Queensland 


This program brought together leading experts in and around the area where 
statistical mechanics, integrability, conformal field theory, and combinatorics meet 
and in some sense overlap. A primary goal was to encourage research collaborations 
across the traditional divides between the research communities interested in the 
separate disciplines. Significant recent developments stem from this kind of cross 
fertilisation, and the aim was to cultivate further such collaborations and widen the 
scope of their successes. 

The scientific presentations and discussions were largely centred around Yang- 
Baxter integrable models; the Razumov-Stroganov conjecture and generalisations 
thereof; combinatorial points and the role of supersymmetry in integrable lattice 
models and quantum chains; the combinatorics of spanning trees and pattern- 
avoiding permutations; and logarithmic conformal field theory. 

With the strong emphasis on collaborations and discussions, there were only a 
couple of seminars per day in the first and third week. The embedded AMSI Work- 
shop took place in the second week and included talks by Bazhanov, Guttmann, 
Hagendorf, Mangazeev, Nienhuis, Pearce, Ridout, Ruelle, Tartaglia, Weston and 
Wheeler. Sessions with informal and brief presentations were held throughout the 


XXIli 


XXIV Combinatorics, Statistical Mechanics, and Conformal Field Theory 


program, with the aim to expand collaborative work on existing research projects 
and to foster new ideas and collaborations. 

The contribution to this volume by Bernard Nienhuis and Kayed Al Qasimi 
was directly stimulated by discussions with Christian Hagendorf at the MATRIX 
Workshop. It provides a proof of a conjecture on certain one-point functions related 
to the Razumov-Stroganov conjecture. 


Jorgen Rasmussen 
Guest Editor 


Participants 


Georgy Feher (Budapest Uni Technology and Economics), Christian Hagendorf 
(Catholic Uni Louvain), Philippe Ruelle (Catholic Uni Louvain), Elena Tartaglia 
(SISSA Trieste), Murray Batchelor (Australian National Uni), Vladimir Bazhanov 
(Australian National Uni), Zeying Chen (Uni Melbourne), Omar Foda (Uni Mel- 
bourne), Jan de Gier (Uni Melbourne), Alexandr Garbali (Uni Melbourne), Tony 
Guttmann (Uni Melbourne), Jon Links (Uni Queensland), Paul Pearce (Uni Mel- 
bourne), Thomas Quella (Uni Melbourne), David Ridout (Uni Melbourne), Alessan- 
dra Vittorini-Orgeas (Uni Melbourne), Michael Wheeler (Uni Melbourne), Paul 
Zinn-Justin (Uni Melbourne), Robert Weston (Heriot-Watt Uni), Atsuo Kuniba 
(Tokyo Uni) 
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Kostya Borovkov 
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Tokyo Metropolitan Uni 
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The mathematical modelling of the various types of risk modern society encoun- 
ters at different levels of its operation has become an important part of applied 
mathematics as well as a source of challenging theoretical problems. The main 
need in modelling risk is where the latter refers to a serious danger to society and 
nature. As illustrated by the recent Global Financial Crisis of 2007—2008, the finance 
industry is one of the most serious sources of risk. Since the finance industry tried 
to (at least, partly) blame mathematical models for what had happened, it is all the 
more important for mathematicians to address the issue of financial risk and use 
mathematics to find ways to mitigate it. 

The need for quantitative risk modelling has, in recent years, attracted enormous 
worldwide attention. The risk related to both extreme and non-extreme events 
is generating a vast research activity, which is international by its very nature. 
Moreover, there is an international regulatory aspect concerning mathematical 
modelling of financial risks. One of the key elements of the current versions 
of the Basel accord (a global regulatory framework for bank capital adequacy, 
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stress testing, and market liquidity risk) is the emphasis on responsible use of 
mathematical models. 

Our program mostly addressed various aspects of mathematical modelling and 
subsequent analysis of risks related to activities in the finance industry and, more 
generally, economics. Major attention was also paid to studying the mathematical 
theory that can be used to model more general types of risk, related to chronic and 
long-term hazards and extremes, as well as the interplay between them. 

The key themes of the program included: 


e the modelling of uncertainty and risk events using the theory of stochastic pro- 
cesses, in particular, the evaluation of the distributions of boundary functionals of 
random processes, including the computation of boundary crossing probabilities; 

* new methods for and approaches to computing the prices of financial derivatives; 

e the systemic risk, including the stability of national and world financial systems, 
consequences for the markets from the wide use of algorithmic trading, network 
modelling of relevant real-life systems; 

e risk modelling and quantification, including risk measures; 

e the analysis of model risk, i.e. the type of risk that arises due to using 
inappropriate mathematical models for asset price dynamics etc.; 

¢ mathematical modelling of extreme events due to factors such as natural disas- 
ters, human errors, infrastructure and computer control systems’ failures. 


Our program included two ‘embedded events’. In the first week of the program, 
we ran four 5-h workshops for PhD students, research workers and industry 
specialists on the following topics: 


e Extreme Value Theory—Applications to risk analysis (M. Kratz); 
e Financial measures of risk and performance (M. Zhitlukin); 

¢ Ruin probabilities: exact and asymptotic results (Z. Palmowski); 
¢ Clearing in financial networks (Yu. Kabanov). 


In the second week of the program, we hosted a research conference where 
about 20 talks were given. The slides used by both the workshop presenters and 
conference speakers are available at the program web-site, https://www.matrix-inst. 
org.au/events/mathematics-of-risk. For the present volume, two of the workshop 
presenters (M. Kratz and M. Zhitlukhin) prepared more detailed expositions of the 
material from their workshops. We are most grateful to them for their time and effort 
required to write these very interesting and instructive papers. Our thanks also go to 
the other program participants who took time to contribute to this volume. Finally, 
we would like to thank the MATRIX and Creswick campus staff for facilitating and 
hosting this event. The participants enjoyed it tremendously. We had an excellent 
opportunity to engage in joint research work and exchange our ideas, both during 
the conference week and outside it. 


Alexander Novikov and Kostya Borovkov 
Guest Editors 
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The year 2017 marked the centenary of the birth of W.T. (Bill) Tutte (1917- 
2002), the great Bletchley Park cryptologist and pioneering graph theorist. This 
Retreat was part of a worldwide programme of Tutte Centenary events, see https:// 
billtuttememorial.org.uk/centenary/, including events at Bletchley Park, Water- 
loo, Cambridge, and Monash. It was scheduled for the week preceding the 5th 
International Combinatorics Conference (SICC) (http://www.monash.edu/Sicc/) at 
Monash. 

The Retreat programme focused on three topics that have grown out of seminal 
contributions made by Tutte at the very start of his career: 


Tutte-Whitney Polynomials. These count a wide variety of structures associated 
with a graph, and are related to network reliability, coding theory, knot theory and 
statistical physics. They were introduced by Whitney (1932) and Tutte (1947, 
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1954), and now play a central role in enumerative graph theory. They extend 
readily to matroids, which were the focus of the second topic. 

Matroid Structure Theory. Many aspects of graph theory are especially nat- 
ural and elegant when viewed in the broader setting of matroids, which are 
combinatorial abstractions of sets of vectors under linear independence. Tutte 
developed the theory of matroid connectivity, and characterised several important 
matroid classes in terms of forbidden substructures (excluded minors). His work 
continues to inspire developments in the field. 

Symmetric Graphs. A lot of recent research on symmetric graphs builds on 
ground-breaking theory by Tutte on the trivalent case. Tutte developed the theory 
of arc-transitive graphs, and the techniques he used formed the foundations of a 
large and growing branch of discrete mathematics. 


The Retreat emphasised collaborative research supported by problem sessions. 
There were three introductory talks: an overview of Tutte’s contributions to 
mathematics, by James Oxley; Tutte-Whitney polynomials, by Gordon Royle; and 
symmetric graphs, by Marston Conder and Michael Giudici. Oxley’s talk led to the 
paper ‘The contributions of W.T. Tutte to matroid theory’ by Graham Farr and James 
Oxley, included in this volume. 

We had a total of 32 participants (including organisers). Participants found the 
workshop to be an exceptionally stimulating event precisely because, instead of 
hearing a long sequence of talks about the work of others, they got to work for 
extended periods on interesting problems with a variety of collaborators. They 
were able to develop new ideas and learn a lot about other recent work. A number 
of questions were answered relatively quickly, simply through sharing knowledge 
among participants. The more substantial research collaborations in each of the 
three themes dealt with the following. 


Tutte-Whitney Polynomials 


¢ An old question of Hassler Whitney (1932) about an elegant extension of the 
four-colour theorem using duality and Tutte-Whitney polynomials. 

¢ Chromatic polynomial of hypergraphs (with the research having been done and 
largely written up during the workshop). 

e A notion of “rank function” for certain algebraic structures which exhibit a 
pleasing duality, with the potential to generalise matroid representability and 
shed light on Tutte polynomials of other combinatorial objects. 

¢ One of the Merino-Welsh conjectures (correlation inequalities relating numbers 
of acyclic orientations and totally cyclic orientations of a graph). 

e The complexity of counting bases in binary matroids. 


Matroid Structure Theory 


¢ A problem of connectivity in frame matroids, a generalisation of the matroids 
that arise from graphs. 
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A problem about the relationship between size and rank in matroids, inspired by 
an old theorem of Edmonds. 

Analysis and generalisation of a distinctive property of wheels and whirls, 
matroids Tutte identified as playing a fundamental role in his theory of 3- 
connectivity for matroids. 

Characterisation of the members of a natural class of matroids where the interac- 
tion of elements, circuits and cocircuits is particularly elegant and symmetric. 


Symmetric Graphs 


Normal quotient analysis for finite edge-transitive oriented graphs of valency 
four. This led to the paper ‘Biquasiprimitive oriented graphs of valency four’ by 
Nemanja Poznanovic and Chery] Praeger in this volume. 

A question of Caprace (at Groups St Andrews, Birmingham, August 2017) on 
whether there exists a 2-transitive permutation group P such that only finitely 
many simple groups act arc-transitively on a connected graph X with local action 
P. Marston Conder gave a partial answer in his paper “Simple group actions on 
arc-transitive graphs with prescribed transitive local action’ in this volume. 
Answer to a question by Folkman (1967) about (bipartite) semi-symmetric 
graphs of order 2n and valency d > n/2, to be published by Marston Conder 
and Gabriel Verret. 

Development of the theory of LR-structures, which in some sense extend Tutte’s 
work on symmetric 3-valent graphs, and the answer to an open question on them. 
A question related to determining the “graph-type” of a larger number of 
transitive groups. This proved quite difficult, but was solved soon after the Retreat 
in joint work with someone who could not attend. 


It is expected that about fifteen papers will result from research that began during 


the workshop week, including the three in this volume of MATRIX Annals. 


We gratefully acknowledge sponsorship by the Faculty of Information Technol- 


ogy, Monash University. 
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The focus of this workshop was originally, as the title “Geometric R-matrices” 
suggests, to discuss the interaction of quantum integrable systems, a topic in 
mathematical physics, with algebraic geometry and representation theory. As the 
subtitle “from geometry to probability” indicates, it was quickly expanded to include 
interactions with other branches of mathematics, in particular combinatorics and 
probability. Here is a brief sketch of these interactions: 


Algebraic Geometry and Representation Theory In the 2000s, the idea emerged 
that the theory of quantum integrable systems could be used to study the (quantum, 
equivariant) cohomology of certain varieties that appear naturally in algebraic 
geometry and representation theory, such as Grassmannians, flag varieties, and 
related Schubert varieties, orbital varieties, etc. This was reformulated as a beautiful, 
coherent program by Maulik and Okounkov in the early 2010s, combining ideas 
from geometric representation theory and in particular Nakajima’s quiver varieties, 
other ideas from geometry (Gromov—Witten invariants, etc.), with concepts coming 
from the study of supersymmetric gauge theories (cf. the work of Nekrasov and 
Shatashvili). The quantum integrable system is defined out of the geometry by 
starting with its building block which is the R-matrix of our title. 


XXXiii 


XXXIV Geometric R-Matrices: From Geometry to Probability 


This area was represented during the week by the two mini-courses “Schubert 
calculus and integrability” by A. Knutson and “Geometric representation theory 
and quantum integrable systems” by A. Okounkov, as well by several talks. 


Combinatorics and Probability Theory There is a large literature, which is still 
rapidly expanding, on the use of quantum integrable methods to study problems of 
a combinatorial or probabilistic nature. Some relevant milestones are: 


¢ Kuperberg’s proof of the Alternating Sign Matrix conjecture using the integrabil- 
ity of the six-vertex model in the late 90s, followed by numerous generalizations 
and variations in the 2000s; 

e the enumeration of Plane Partitions, and more the study of the related lozenge 
tilings as a probabilistic model, using free fermionic techniques (the simplest 
“integrable” model) in the 2000s, in particular in the work of R. Kenyon, and 
their extension to non free-fermionic integrable models; 

e the connection of integrability and cluster algebras, which remains partially 
mysterious, though much progress has been done recently in that area; and 

e the field of “integrable probability”, closely connected to the subject of this 
workshop but which has become sufficiently wide in itself that it had its own 
separate MATRIX program in January 2018. 


These topics were all present during this workshop via talks of our participants. 


The two contributions in this volume reflect this diversity of subjects. On the one 
hand, Y. Yang and G. Zhao’s lecture notes “How to sheafify an elliptic quantum 
group” belong to the first type of interaction, emphasizing the use of cohomoloical 
Hall algebras to build geometrically the underlying “symmetry algebras” of quan- 
tum integrable systems, namely quantum groups, and applying this construction to 
elliptic cohomology. On the other hand, G. Koshevoy’s article “Cluster decorated 
geometric crystals, generalized geometric RSK-correspondences, and Donaldson- 
Thomas transformations” is of a more combinatorial nature, developing interesting 
new concepts in the theory of cluster algebras and geometric crystals. 
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Guest Editor 
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Refereed Articles 


A Metropolis-Hastings-Within-Gibbs M®) 
Sampler for Nonlinear peels 
Hierarchical-Bayesian Inverse Problems 


Johnathan M. Bardsley and Tiangang Cui 


Abstract We investigate the use of the randomize-then-optimize (RTO) method 
as a proposal distribution for sampling posterior distributions arising in nonlinear, 
hierarchical Bayesian inverse problems. Specifically, we extend the hierarchical 
Gibbs sampler for linear inverse problems to nonlinear inverse problems by 
embedding RTO-MH within the hierarchical Gibbs sampler. We test the method 
on a nonlinear inverse problem arising in differential equations. 


1 Introduction 


In this paper, we focus on inverse problems of the form 
y=Fu)te, e~ VO,2"'D, (1) 


where y € R” is the observed data, u € R” is the unknown parameter, F : R’ > 
IR” is the forward operator, and 4 is known as the measurement precision parameter. 
The likelihood function then has the form 


m Xx 
p(ylu, 4) = (20) 22"? exp (- F(a) — yi?) (2) 
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Next, we assume that the prior is a zero-mean Gaussian random vector, u ~ 
AN (0, (6L)—!), which has distribution 


n mt re) 
p(uld) = (27) 28"/? exp (-Se"Lu) : (3) 


where L is defined via a Gaussian Markov random field (GMRF) [1], and 7 is the 
rank of L. In the one-dimensional numerical example considered at the end of the 
paper, we choose L to be a discretization of the negative-Laplacian operator. The 
hyper-parameter 5, which is known as the prior precision parameter, provides the 
relative weight given to the prior as compared to the likelihood function. 

In keeping with the Bayesian paradigm, we assume hyper-priors p(A) and p(6) 
on A and 6, respectively. A standard choice in the linear Gaussian case is to choose 
Gamma hyper-priors: 


ptr) « A%—! exp(—f,A), (4) 
p(d) « 5%—! exp(—B56). (5) 


This is due to the fact that the conditional densities for 4 and 6 are then also Gamma- 
distributed (a property known as conjugacy), and hence are easy to sample from. 
We choose hyper-parameters a, = a3 = 1 and 6, = Bs = 107+, making the 
hyper-priors exponentially distributed with small decay parameters 6, and Bs. In 
the test cases we have considered, these hyper-priors work well, though they should 
be chosen carefully in a particular situation. Specifically, it is important that they are 
chosen to be relatively flat over the regions of high probability for A and 6 defined 
by the posterior density function, so that they are not overly informative. 

Taking into account the likelihood, the prior, and the hyper-priors, the posterior 
probability density function over all of the unknown parameters is given, by Bayes’ 
law, as 


plu, A, dly) 
= p(ylu, A) p(uld) p(a) p(6) /p(y) 


_ x r) 
od aac adibn a.1') (-F1Fw - yl? — zu Lu — Br pss) a) 


where p(y) is the normalizing constant for the posterior. Our focus in this paper is 
to develop a Gibbs sampler for sampling from the full posterior (6). For this, we 
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need the full conditionals, which are given by 


fl 
pAlb, u, 5) 0 WPF! exp (|-s1F™ ay = p,| i) (7) 
. 1 
p(Sly, u, A) x 67/2 F871 exp ([-se"t = ps| 5) ; (8) 
r 2 é vi 
puly, 4,8) x exp | —>IIF(u) — yi’ — 5w Lu). (9) 


The Gamma-hyper priors are conjugate, and hence the conditional densities for » 
and 6 are also Gamma-distributed: 


1 
ia, 8.8 ~P (m/2-+ a3, SIF) — bIP-+ ps). (10) 
= l > 
lu, 4,b~ F (7/2 +a5, 50" Lu + Bs). (11) 


The distributions (10) and (11) are independent so that p(A,6|b,x) = 
p(A|b, x) p(6|b, x). Hence, computing independent samples from (10) and (11) 
yields a sample from p(A, 5|b, x). Moreover, in the linear case, F is a matrix and 
the Gaussian prior is also conjugate, leading to a Gaussian conditional (9), which 
can be equivalently expressed 


u~ V (QF’F + 6L)~!AF’y, (AFT F + sL)") . 


Taking these observations all together leads to the two-stage Gibbs sampler given 
next, which is also presented in [1, 2]. 


The Hierarchical Gibbs Sampler, Linear Case 


0. Initialize (Ao, 59), uw? = (AoF? F + 60L)~!AoF’y, set k = 1, define Ktotal. 
1. Compute (Ax, dk) ~ pA, Sly, u‘—!) as follows. 


a. Compute Ay ~ I" (m/2+ on, 5|[Fuk—! — yi? + pr). 
b. Compute 6, ~ I" (7/2 + as, 5(uk!)7 Luk! + ps). 


2. Compute uk ~ AW ((AgFTF + 5¢L)7!AgFTy, (AgF’F + 5¢L)7'). 
3. If k = Kota, Stop, otherwise, set k = k + | and return to Step 1. 


When F is nonlinear, the conditional density p(uly, A, 5), defined in (9), is no 
longer Gaussian and cannot be sample from it directly. To overcome this, we embed 
a Metropolis-Hastings (MH) step within step 2 of hierarchical Gibbs, as advocated 
in [4, Algorithm A.43]. For the MH proposal, we use the randomize-then-optimize 
(RTO) [3], and thus we begin in Sect. 2 by describing the RTO proposal. In Sect. 3, 
we describe RTO-MH and its embedding within hierarchical Gibbs for sampling 


6 J. M. Bardsley and T. Cui 


from the full posterior (6). Finally, we use RTO-MH-within-hierarchical Gibbs 
to sample from (6) in a specific nonlinear inverse problem arising in differential 
equations. Concluding remarks are provided in Sect. 5. 


2 The Randomize-Then-Optimize Proposal Density 


We first define the augmented forward model and observation taking the form 


det [ A!/?F(0 def [ A!/2 
Fy s(u) = kee and yas = i mle 


For motivation, note that in the linear case, p(uly, A, 5) is Gaussian and can be 
sampled by solving the stochastic least squares problem 


uly, A, 3 = arg min ||Fy,3% — (a, + Ol’, @€~ VOD. (12) 


This follows from the fact that if Fs; = Qy,sRa,s is the thin (or condensed) QR- 
factorization of F, 5, and F,.5 has full column rank, then Q).5 € R(M+N)XN has 
orthonormal columns spanning the column space of F) 5; Ry.5 € RY*N is upper- 
triangular and invertible; and the solution of (12) is unique and can be expressed 


Qi sFis(ulb, 4,5) =Q7 s(vaste), €~ VOD. (13) 


Note that in the linear case Qi 3F 1,6 = Ry,s, and it follows that (13) yields samples 
from p(uly, A, 5). 

In the nonlinear case, Eq. (13) can still be used, but the resulting samples do not 
have distribution p(uly, 4, 6). To derive the form of the distribution, we first define 


def 
r,s(U) = Fy.s(u) — yas 
and denote the Jacobian of F, 5, evaluated at u, by J,,5(u). Then, provided Qr 3Fi.s 
is a one-to-one function with continuous first partial derivatives, and its Jacobian, 


Qi? 39a,» is invertible, the probability density function for ulb, 4, 6 defined by (13) 
is 


1 
prro(ulb, A, 5) « de(QF 5J..s(w)| exp Gaceols 


= ¢4,6(x) p(ulb, 4, 6), (14) 
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where 
1 1 
c.,5(W) = det(QF Ja.s(w)) | exp Geol Ol E (15) 


There is flexibility in how to choose Q,.5 € RUN though Ql sFis must 
satisfy the conditions mentioned in the previous sentence. In our implementations 
of RTO, we have used Q) 5 from the thin QR-factorization J,,s(Ua,s) = Qa sRas, 
where uy ,5 is the MAP estimator, i-e., Uys = arg Miny ||F,,5(u) — ya.s \|. 

In practice, we compute samples from (13) by solving the stochastic optimization 
problem 


1 
ue = arg min 511Q;,3(F,,0(H) —(yaste)?, e ~ VOD. (16) 


The name randomize-then-optimize stems from (16), where y,,5 is first ‘random- 
ized’, by adding e*, and then ‘optimized’, by solving (16). Finally, we note that if 
the cost function minimum in (16) is greater than zero, (13) has no solution, and we 
must discard the corresponding sample. In practice, we discard solutions x* of (16) 
with a cost function minimum greater than 7 = 10~8, though we have found this to 
occur very rarely in practice. 


3  RTO-Metropolis-Hastings and Its Embedding Within 
Hiererichical Gibbs 


Although RTO does not yield samples from p(uly, 4, 5) for nonlinear problems, 
it can be used as a proposal for MH. At step k of the MH algorithm, given the 
current sample vw. one can use (16) to compute u* ~ prro(ulb, A, 5) and then 
set ué = ué—! with probability 


p(u*ly, A, 6) prro(u*|ly, A, 8) 

* p(ukljy, A, Fe EE) 

oa (1 p(u*ly, A, 8)ca,5@a!) pau ly, A, 2) 
* p(uk!ly, A, 5)ca,3(u*) p(u* ly, A, 5) 


k-1 
= min (1. wit), (17) 


€,5(U*) 


ry.6 = min (1 


Note that it is often advantageous, for numerical reasons, to replace the ratio in (17) 
by the equivalent expression 


c1,3(0')/e,,5(a") = exp (Ine,5(a) = Inc,,s(u")), 
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where 


1 1 
In ;,o(@) ~ In| QZ sJ.s(@)| + 5lhm..500I? — 51QL9n.6001P, 


and ‘~’ denotes “equal up to an additive, unimportant constant.’ 
The RTO-MH Algorithm 


1. Choose initial vector uw’, parameter 0 < n <0, and samples N. Set k = 1. 
2. Compute u* ~ prro(uly, 4,6) by solving (16) for a fixed realization e* ~ 
NV (0,1. If 


lQX.5(Fx,s(u*) — (ya,6 + €*))II? > 07, 


then repeat step 2. 
. Set uk = u* with probability r,,3 defined by (17). Else, set ué = ué—!. 
4. Ifk < N, setk =k + 1 and return to Step 2, otherwise stop. 


WwW 


* 


The proposed sample u* is independent of u‘—!, making RTO-MH an inde- 
pendence MH method. Thus, we can apply [4, Theorem 7.8] to obtain the 
result that RTO-MH will produce a uniformly ergodic chain that converges in 
distribution to p(uly, A, 5) provided there exists M > 0 such that p(uly,a,5) < 
M - prro(uly, A, 4), for allu € RY. Given (14), this inequality holds if and only if 
c,,6(u), defined by (15), is bounded away from zero for all u. 


3.1 RTO-MH-Within-Hierarchical Gibbs 


In the hierarchical setting, we embed a single RTO-MH step within the hierarchical 
Gibbs sampler, to obtain the following MCMC method. 


RTO-MH-Within-Hierarchical Gibbs 
0. Initialize (Ag, 60), set k = 1, define Ktotal, and set 


0 : 2 
u = arg ae Fao.do (u) _ Yro,d0 Il . 


1. Simulate (Ax, 5x) ~ p(A, bly, wu!) as follows. 
a. Compute A, ~ I" (m/2 + ay, SIF!) —yll? + f,). 
b. Compute 5 ~ I (7/2 +a, AOI) TLxk-} + ps). 

2. Simulate ub‘ using RTO as follows. 


a. Compute u* ~ prro(uly, Ax, 5x) by solving (16) with (A, 5) = (Ag, dx). 
b. Set uk = u* with probability r,,5, defined by (17), else set we = yl, 


3. If k = ktotal stop, otherwise, set k = k + | and return to Step 1. 
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In step 2a, note that two optimization problems must be solved. First, the MAP 
estimator u,,,5, is computed; then the QR-factorization J(uy,,5,) = Qaj,.5,Raz.s; 
is computed; and finally, the stochastic optimization problem (16) is solved, with 
(A, 5) = (Ag, 5x), to obtain the RTO sample u*. One could take multiple RTO-MH 
steps in Step 2, within each outer loop, to improve the chances of updating u‘~', 
but we do not implement that here. 


4 Numerical Experiment 


To test RTO-MH-within-hierarchical Gibbs, we consider a nonlinear inverse prob- 
lem from [1, Chapter 6]. The inverse problem is to estimate the diffusion coefficient 
u(s) from measurements of the solution x(s) of the Poisson equation 


d OF ee 0 1 18 
- £ (uF) = £00 <s<l, (18) 


with zeros boundary conditions x(0) = x(1) = 0. Assuming a uniform mesh on 
(0, 1], after numerical discretization, (18) takes the form 


Bwx=f, Bu) “ D7 diagwwD, (19) 


where u € R” and D € R”*"~! is a discrete derivative matrix. To generate data, 
we compute numerical solutions corrresponding to two discrete Dirac delta forcing 
functions, f)(s) and fo(s), centered at s = 1/3 and s = 2/3, respectively. After 
discretization, f; and f2 become (n — 1) x | Kronecker delta vectors f; and f2, and 
the measurement model takes the form of (1) with 


“ —l 
|| and FO) | Oe | 
¥2 Jon—2 Bu) fe Jo,-9, 


so that m = 2n — 2. We generate data using (1) with n = 50 and Utrue obtained by 
discretizing 


u(s) = min{1, 1 — 0.5 sin(27(s — 0.25))}, 
and A~! chosen so that signal-to-noise ratio, ||F(Utrue) ||/~mA7!, is 100. The data 


vectors y; and y2 are plotted in Fig. 1a together with the noise-free data B(x)~'f; 
and B(x)7!f>. 
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Fig. 1 (a) Plots of the Plot of data (0), true model (-), and model fit (--) 
measured data b; and bo, the ‘ad 
true state u, and the model 
fits. (b) Plots of the true 
diffusion coefficient x 
together with the RTO-MH 
sample median and the 
element-wise 95% credibility 
bounds 


1.2 T T T 
i 
& rue . / 
0 a eran — — —Sample Median a WAR 17 
= \/ Ae [So 95% credibility bounds aa \ Ly 


(b) 


With the measurements in hand, we implement RTO-MH-within-hierarchical 
Gibbs for sampling from (6). The results are plotted in Figs. | and 2. In Fig. 1b, we 
see the sample median together with 95% credibility intervals computed from the 
u-chain generated by the MCMC method. In Fig. 2a, we plot the individual chains 
for A, 5, and a randomly chosen element of the u-chain. And finally, in Fig. 2b, we 
plot the auto correlation functions and associated integrated autocorrelation times 
(Tint) for these three parameters [1]. 
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Fig. 2 (a) Plots of the chains for three randomly selected elements of x. (b) Autocorrelation times 
associated with these chains 


5 Conclusions 


In this paper, we have tackled the problem of sampling from the full posterior (6) 
when F is a nonlinear function. To do this, we followed the same approach as 
the hierarchical Gibbs algorithm of [2], however in that algorithm, F is linear, the 
conditional density p(uly, A, 6) is Gaussian, and hence samples from p(uly, A, 5) 
can be computed by solving a linear system of equations. In the nonlinear case, 
p(uly, 4,5) is non-Gaussian, but we can use RTO-MH to obtain samples, as 
described in [3]. We obtain a MH-within-Gibbs method for sampling from (6) 
by embedding a single RTO-MH step with hierarchical Gibbs. We then tested the 
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method on a nonlinear inverse problem arising in differential equations and found 
that it worked well. 
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Sequential Bayesian Inference for ®) 
Dynamical Systems Using the Finite od 
Volume Method 


Colin Fox, Richard A. Norton, Malcolm E. K. Morrison, 
and Timothy C. A. Molteno 


Abstract Optimal Bayesian sequential inference, or filtering, for the state of 
a deterministic dynamical system requires simulation of the Frobenius-Perron 
operator, that can be formulated as the solution of an initial value problem in the 
continuity equation on filtering distributions. For low-dimensional, smooth systems 
the finite-volume method is an effective solver that conserves probability and 
gives estimates that converge to the optimal continuous-time values. A Courant— 
Friedrichs—Lewy condition assures that intermediate discretized solutions remain 
positive density functions. We demonstrate this finite-volume filter (FVF) in a 
simulated example of filtering for the state of a pendulum, including a case where 
rank-deficient observations lead to multi-modal probability distributions. 


1 Introduction 


In 2011, one of us (TCAM) offered to improve the speed and accuracy of the 
Tru-Test scales for ‘walk over weighing’ (WOW) of cattle, and wagered a beer 
on the outcome [8]. Tru-Test is a company based in Auckland, New Zealand, 
that manufactures measurement and productivity tools for the farming industry, 
particularly for dairy. Tru-Test’s XR3000 WOW system was already in the market, 
though they could see room for improvement in terms of on-farm usage, as well as 
speed and accuracy. Indeed, advertising material for the XR3000 stated that WOW 


requires that the animals pass the platform regularly and smoothly 


which hinted at the existing processing requiring somewhat constrained movement 
by the cows for it to deliver a weight. 
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Fig. 1 A dairy cow walking 
over a weigh bridge placed 
near the milking shed 
(photocredit: Wayne 
Johnson/Pizzini Productions) 


Figure | shows a cow walking over the weigh-bridge in the WOW system located 
on the ground in the path from a milking shed. The weigh bridge consists of a 
low platform with strain gauges beneath the platform, at each end, that are used to 
measure a time series of downward force from which weight (more correctly, mass) 
of the cow is derived. 

The plan, for improving estimates of cow mass from strain-gauge time series, 
was to apply Bayesian modeling and computational inference. Bayesian inference 
allows uncertain measurements to be modeled in terms of probability distributions, 
and interpreted in terms of physical models that describe how the data is produced. 
This leads to estimates of parameters in the model, such as the mass of a cow, and 
meaningful uncertainty quantification on those estimates. At the outset we devel- 
oped dynamical-systems models for the moving cow, with some models looking 
like one or more pogo sticks. Operation in real-time would require developing new 
algorithms for performing the inference sequentially—as the data arrives—and new 
hardware with sufficient computing speed to implement those algorithms. Figure 2 
(right) shows hardware developed for this application, that includes strain-gauge 
signal conditioning, digitization, and an embedded ARM processor, alongside the 
XR3000 electronics and display (left). 

This paper describes an algorithm for optimal sequential Bayesian inference 
that we developed in response to this application in cow weighing. We first give 
a stylized model of WOW, then a method for optimal filtering for tracking the state 
of a nonlinear dynamical system, then present numerical examples for the stylized 
model. 
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Fig. 2, WOW hardware in 2016: existing commercial unit (left), and prototype with embedded 
processing (right) (photocredit and hardware design: Phill Brown) 


Fig. 3 A pendulum of length / with mass m, undergoing motion with angular displacement 6 and 
angular velocity w. The force F in the string is measured 


1.1 A Stylized Problem 


A (very) stylized model of WOW is the problem of tracking a simple pendulum of 
length / and mass m when only the force F in the string is measured, as depicted in 
Fig. 3. For this system the kinematic variables are the angular displacement (from 
the vertical downwards) 6 and the angular velocity w. The kinematic state (0, w) 
evolves according to 


<6, o) = («, -= sin) 


where g is the acceleration due to gravity, / the length of the pendulum. 
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The force Fy is measured at times k, k = 1, 2,3, ..., with the (noise free) value 
related to the state variables by 


Fy = ml (th) + mg cos 0 (tk). 


Estimation of the parameter m may be performed by considering the augmented 
system 


d 
a @,m) = («, -5 sind, 0) : 


See [7] for a computed example of parameter estimation in this system. 


2 Sequential Bayesian Inference for Dynamical Systems 


Consider now a general dynamical system that evolves according to the 
(autonomous) differential equation 


d 

a f (x), (1) 
where f is a known velocity field and x(t) is the state vector of the system at time 
t. Given an initial state x(0) = xo at t = 0, Eq. (1) may be solved to determine 
the future state x(t), t > 0, that we also write x(t; x9) to denote this deterministic 
solution. 

At increasing discrete times t,, k = 1, 2,3, ..., the system is observed, returning 
measurement z, that provides noisy and incomplete information about x, = x (tx). 
We assume that we know the conditional distribution over observed value z,, given 
the state x,, 


P (Zk |Xk) 
Let Z, = {zx :t, <t} denote the set of observations up to time f, and let 
(the random variable) x, = x(t) denote the unknown state at time t. The formal 


Bayesian solution corresponds to determining the time-varying sequence of filtering 
distributions 


p (X;|Z;) (2) 


over the state at time t conditioned on all available measurements to time f. 
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Discrete-Time Formulation A standard approach [1] is to discretize the system 
equation (1) and treat the discrete-time system [3]. When uncertainty in f is 
included via “process noise’ vx, observation errors via ‘observation noise’ mx, the 
discrete-time problem is written as 


Xk = fk (Xk-1, Vk) 


Zk = hg (XK, Nk) 


with functions f; and hz assumed known. 

When the random processes vz and nx are independently distributed from the 
current and previous states, the system equation defines a Markov process, as does 
Eq. (1), while the observation equation defines the conditional probability p (zx|xx). 

We will treat the continuous-time problem directly, defining a family of numer- 
ical approximations that converge in distribution to the desired continuous-time 
distributions. 


Continuous-Time Bayesian Filtering Sequential Bayesian inference iterates two 
steps to generate the filtering distributions in Eq. (2) [5]. 


Prediction Between measurements times f, and t41, Z; is constant and the 
continuous-time evolution of the filtering distribution may be derived from the 
(forward) Chapman-Kolmogorov equation 


P(Xr4at|Zr¢ar) = P (Xr+A1|Z1) = fe (X14ArlX1, Zr) P (Xr|Zr) dxy 
= f 8 e¢r.ar — 241 00 CHIZD) dey, 
which defines a linear operator on the space of probability distributions, 


Sar: P(X1|Z;) > p (X14 a1|Zy) . (3) 


Sar 1s the Frobenius-Perron (or Foias) operator for time increment Af. 


Update At measurement times t,, Z; changes, from Zxz_; to Zz, and the filtering 
distribution changes, typically discontinuously, as 


P (Zk|Xk) P (Xk|Ze-1) 
Zk) = wm — 4 
Pp (xK|Zx) pale ‘ (4) 


which is simply Bayes’ rule written at observation time t;. We have written x, = X4 
and Z; = Z;,, and used conditional independence of zx; and Zz_ 1 given Xx. 
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p(a;t) 
p(a;t+dt) 


p(x) f(x) c= ma )> p(x+da) f(a+da) 


zr zr+dzr 


Fig. 4 A schematic of probability flux in region (x, x +dx), and for time (t, t-+dt). The schematic 
shows greater flux exiting the region than entering, correspondingly the pdf at t + dt is decreased 
with respect to the pdf at t 


2.1 The Frobenius-Perron Operator is a PDE 


The Frobenius-Perron operator in Eq. (3), that evolves the filtering density forward 
in time, may be written as the solution of an initial value problem (IVP) in a partial 
differential equation (PDE) for the probability density function (pdf). 

For pdf p(x; ft) over state x and depending on time ¢, the velocity field f(x) 
implies a flux of probability equal to p(x; t) f (x). Figure 4 shows a schematic of the 
pdf and probability flux in region (x, x + dx), and for the time interval (t, t + df). 
Equating the rate of change in the pdf with the rate at which probability mass enters 
the region, and taking dx, dt — 0, gives the continuity equation 


0 
= Veep): (5) 
The Frobenius-Perron operator S,;, for time interval At, may be simulated by 
solving the PDE (5) with initial condition p(x;0) = p(x;|Z;) to evaluate 
p(x; At) = p (Xr+ar|Zz). 

Equation (5) is a linear advection equation. When the state equation has additive 
stochastic forcing, as is often used to represent model error, evolution of the filtering 
pdf is governed by a linear advection-diffusion (Fokker-Planck) equation. 


3 Finite Volume Solver 


The finite volume method (FVM) discretizes the continuity equation in its integral 
form, for each ‘cell’ K in a mesh, 


= [ pax + $ otf as =o. 
K 


0K 
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Write L ~ K if cells L and K share a common interface, denoted Ex, and denote 
by #x_z the unit normal on Ex, directed from K to L. Define the initial vector of 


cell values by PQ. = val Sg (x; 0) dx then form = 0, 1,...,r compute P”"*! as 
prt 7 Pp” 1 
K K m 
an ee fkiPx, =9, 
At |K | py - 
where 
A Pp if > 0 
SkKL = f nx ids and eL= a fxi 2 
Ext Pr if fxr <0 


is the normal velocity on Ex ,, and first-order upwinding scheme, respectively. 
In matrix form, the FVM step for time increment At is 


p+! — (7 — AtA) P”, 


where I is the identity matrix and A is a sparse matrix defined above. This formula 
is essentially Euler’s famous formula for the (matrix) exponential. 

Since fx, = —fLK, the FVM conserves probability at each step, i.e., 
YK \K\P2*! = )'x« |KiP'%. The FVM also preserves positivity of the pdf when 
the time step Aft is small enough that the matrix J — AfA has all non-negative 
entries. It is straightforward to show that positive entries of the matrix A can occur 
on the diagonal, only. Hence, the Courant—Friedrichs—Lewy (CFL) type condition, 
that assures that the FVM iteration is positivity preserving, may be written 


2, ©) 
max; Aj i 
With this condition, the FVM both conserves probability and is positivity preserv- 
ing, hence is a (discrete) Markov operator. In contrast, the numerical method for the 
matrix exponential in MATLAB, for example, does not preserve positivity for the 
class of matrices considered here. 


4 Continuous-Time Frobenius-Perron Operator 
and Convergence of the FVM Approximation 


In this section we summarize results from [7], to develop some analytic properties 
of the continuous-time solution, and establish properties and distributional conver- 
gence of the numerical approximations produced by the FVM solver. 
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Let X(-,t) : R¢ > R¢@ denote the map from initial condition to the solution of 
Eq. (1) at time t > 0, and Y(., t) = X¢, t)! If X(, £) is non-singular (|Y (E, t)| = 
Oif |E| = OV Borel subsets E C R®), then Vt > 0, the associated Frobenius-Perron 
operator [6] S; : L'(R“) > L!(R®%) is defined by 


/ S;pdx = / pdx Y Borel subsets E Cc R@. 
E Y(E,t) 


Given an initial pdf po, the pdf p(-; ft) at some future time, t > 0, may be 
computed by solving (see, e.g., [6, Def. 3.2.3 and §7.6]) 


2ptdiv(fp) =0 Vx € R¢,t>0 


p(x; 0) = po(x) —-¥x ER? 7) 


Then, Vt > 0, the Frobenius-Perron operator S; : Li(R4) > LR?) is defined 
such that for any p € L!(R¢), 


St p = PC t), 


where p is a solution to the IVP (7) with po = p. Existence of a Frobenius-Perron 
operator and (weak) solutions to the IVP depends on the regularity of f. 


Definition 1 (Definition 3.1.1. in [6]) A linear operator § : L' (R72) > L!(R4) is 
a Markov operator (or satisfies the Markov property) if for any f € L!(R¢) such 
that f > 0, 


Sf =0 and WSF llzigaay = WF llcr cpa: 


If f has continuous first order derivatives and solutions to Eq. (1) exist for all 
initial points xo € R¢@ and all t > 0 then the Frobenius-Perron operator is well- 
defined, satisfies the Markov property, and {S; : t > O} defines a continuous 
semigroup of Frobenius-Perron operators. 


FVM Approximation For computational purposes it is necessary to numerically 
approximate the Frobenius-Perron operators. We use piece-wise constant function 
approximations on a mesh and the FVM. 

Define a mesh 7 on R¢ as a family of bounded, open, connected, polygonal, 
disjoint subsets of R? such that R¢ = UxegzK. We assume that the common 
interface between two cells is a subset of a hyperplane of R“, and the mesh is 
admissible, 1.e., 


ah? <|K| 


da >0: 
JaK| < the! 


VKe TZ 
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where h = sup{diam(K) : K € 7}, |K| is the d-dimensional Lebesgue measure of 
K, and |dK| is the (d — 1)-dimensional Lebesgue measure of 0K. 

We will use superscript h to denote numerical approximations (though, strictly, 
we should use 7 as h does not uniquely define the mesh). 

The following gives the CFL condition for the (unstructured) mesh 7. Suppose 
that for some é € [0, 1) and co > 0, we say that At satisfies the CFL condition if 


At )~ max{0, fxr} <(1-8)|K| VK € Zand At < coh. (8) 
L~K 


Lemma 1 [/f At satisfies the CFL condition in Eq. (8) and po = 0 then 
hey. d 
Dp (x31) =0 Vx € R*,t > 0, 


and sh is a Markov operator. 


The following theorems establish convergence of solutions of the FVM, and 
convergence of expectations with respect to the filtering distributions. 


Theorem 1 Suppose div f = 0, p € BV(R®), and At satisfies the CFL condition 
for some & € (0, 1). Then Vt = 0, 


S10 — Sf ellzicay SCE Molly 7h? + §'7th), 


Convergence of expectations is a consequence of convergence of our FVM. 


Theorem 2 Suppose H,T < oo. Under the same assumptions as previous Theo- 
rem, if: 


1. g € L®(R®), or 
2. g € L® (R®) and p has compact support, 


loc 


then there exists a constant C independent of h and t such that 
Esrlg]—Es,plgl]<Ch'/? vt € [0, T],h € ©, HI. 


This guarantees convergence in distribution of the discrete approximation to the 
continuous-time filtering pdfs in the limit h > 0. 

In numerical tests [7] we found convergence to be @(h), which is twice the order 
predicted by Theorem 2. Since the CFL condition requires the time step is also O(h), 
the method is @(At) accurate. Thus the FVM method we use achieves the highest 
order permitted by the meta theorem of Bolley and Crouzeix [2], that positivity- 
preserving Runge-Kutta methods can be first order accurate, at most. 
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5 Computed Examples 


We now present a computed example of the finite volume filter (FVF) that uses the 
FVM for implementing the Frobenius-Perron operator during the prediction phase, 
and the update rule, Eq. (4) evaluated using mid-point quadrature. 

Further details of these numerical experiments can be found in [4], including 
comparison to filtering produced by the unscented Kalman filter (UKF). 


5.1 FVF Tracking of a Pendulum from Measured Force 


Figure 5 shows four snapshots of the filtering pdf for the stylized model of Sect. 1.1. 
The ‘true’ pendulum was simulated with initial condition (69,9) = (0.27, 0). 
Eight measured values of the force were recorded, per 27 time, over time period 
of 37, with added Gaussian noise having 0 = 0.2. The FVF was initialized with 
N(0, 0.871). 

Since the initial and filtering pdfs are symmetric about the origin, the means 
of angular displacement and velocity are always identically zero. Hence, filtering 
methods that such as the UKF, or any extension of the Kalman filter that assumes 
Gaussian pdfs, or that focus on the mean as a ‘best’ estimate, will estimate the state 
as identically zero, for all time. Clearly, this is uninformative. 

In contrast, the FVF has localized the true state after 377 time (about 1.5 periods), 
albeit with ambiguity in sign. Properties of the system that do not depend on the 
sign of the state, such as the period, the length of the pendulum, or the mass of 
the pendulum, can then be accurately estimated. The computed example in [7] 


Fig. 5 Initial (¢ = 0) and 
filtered pdfs in phase-space 2 
after measurements at times 
t = 7/4, x, and 3z (left to 
right, top to bottom) 0 
-2 
fa) 
2 
0 
-2 
-2 0 2 -2 0 2 
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shows that the length of the pendulum is correctly determined after just 1.5, and that 
the accuracy of the estimate improves with further measurements (and time). The 
same feature holds for estimating mass. Hence, the FVM is successful in accurately 
estimating the parameters in the stylized model, even when the measurements leave 
ambiguity in the kinematic state. 


6 Conclusions 


Bayes-optimal filtering for a dynamical system requires solving a PDE. It is 
interesting to view filters in terms of the properties of the implicit, or explicit, PDE 
solver, such as the density function representation and technology used in the PDE 
solver. This paper develops a FVM solver using the simplest-possible discretization 
to implement a Bayes-optimal filter, that turned out to be computationally feasible 
for low-dimensional smooth systems. 

The reader may be interested to know that TCAM won his wager, and beer, with 
new sequential inference algorithms now producing useful results on the farm. In 
the interests of full disclosure we should also report that the original notion of 
utilizing a simple dynamical-systems model for a walking cow did not perform 
well, as the model ‘cow’ would eventually walk upside down, just as the pendulum 
prefers to hang downwards. In response, we developed models for cow locomotion 
based on energy conservation, that are beyond the scope of this paper. However, 
the FVF has found immediate application in other dynamic estimation problems 
where a dynamical model that evolves a state vector works well, such as estimating 
the equilibrium temperature of milk during steaming as a tool for training coffee 
baristas. 
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Heikki Haario, Janne Hakkarainen, Ramona Maraia, and Sebastian Springer 


Abstract A new approach was recently introduced for the task of estimation of 
parameters of chaotic dynamical systems. Here we apply the method for stochastic 
differential equation (SDE) systems. It turns out that the basic version of the 
approach does not identify such systems. However, a modification is presented that 
enables efficient parameter estimation of SDE models. We test the approach with 
basic SDE examples, compare the results to those obtained by usual state-space 
filtering methods, and apply it to more complex cases where the more traditional 
methods are no more available. 


1 Introduction 


The difficulty of estimating parameters of chaotic dynamical models is related 
to the fact that a fixed model parameter does not correspond to a unique model 
integration, but to a set of quite different solutions as obtained by setting slightly 
different initial values, selecting numerical solvers used to integrate the system, or 
tolerances specified for a given solver. But while all such trajectories are different, 
they approximate the same underlying attractor and should be considered in this 
sense equivalent. In [4] we introduced a distance concept for chaotic systems based 
on this insight. Modifying one of the fractal dimension definitions, the correlation 
dimension, we calculate samples from the phase space of the system and map these 
points onto a stochastic vector. The vector turns out to be Gaussian, providing a 
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natural likelihood concept that quantifies the chaotic variability of points of a chaotic 
system within a given setting of observations. 

Stochastic differential equation (SDE) systems behave partly in a similar way: 
each integration of a given system with fixed model parameters produces a different 
realization. This calls for methods that can quantify the variability of the realiza- 
tions. On the other hand, the stochastic nature of a SDE system is clearly different 
from the chaotic variability of a deterministic chaotic system. Consequently, the 
phase space behavior of each type of systems is different as well. The aim of 
this work is to study to which extent the parameter estimation approach originally 
developed for chaotic systems can be applied to SDE models. 

The rest of the paper is organized as follows. In the Background section we 
recall the correlation integral likelihood concept and outline the results obtained 
for chaotic systems. In Numerical experiments we exhibit the performance of 
the method for the Ornstein-Uhlenbeck model and extensions of it, together with 
comparisons to more standard, Kalman filter based methods. 


2 Background 


The standard way of estimating parameters of dynamical systems is based on the 
residuals between the data and the model responses, both given at the time points 
of the measurements. Supposing the statistics of the measurement error is known, 
a well defined likelihood function can be written. The maximum likelihood point 
is typically considered as the best point estimator, and it coincides with the usual 
least squares fit in the case of Gaussian noise. The full posterior distribution of 
parameters can be sampled by Markov chain Monte Carlo (MCMC) methods. The 
approach has become routine for the parameter estimation of deterministic models 
in Bayesian inference. 

The estimation of the parameters of stochastic models is not so straightforward. 
A given model parameter does not correspond to a fixed solution, but a whole 
range of possible realizations. Several methods have been proposed to overcome 
this difficulty. State-based approaches estimate the joint distribution of the state 
vector and the parameters. The likelihood for the parameter is obtained as a marginal 
distribution, effectively by ‘integrating out’ the state space. This approach is routine 
in the context of linear time series modeling, and implemented by the likelihood 
obtained by application of the Kalman filter formulas, see [2, 7, 11]. 

Here we study a different way of characterizing the stochastic variability of the 
state space. Supposing that a sufficient amount of data is available, we create a 
mapping from it onto a feature vector. The mapping is based on averaging, and 
the vector turns out to be asymptotically Gaussian. From real data, the mean and 
covariance of this Gaussian distribution can be empirically estimated. Thus we have 
a likelihood available, both for maximum likelihood parameter estimation and for 
MCMC sampling of the parameter posterior. The idea is the same as that earlier 
used for estimating parameters of chaotic models in [4], but certain modifications 
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are needed for SDE systems. We discuss the basic setting of the approach below, as 
well as the reasons behind the modifications needed. 


2.1 Likelihood via Filtering 


A standard way of estimating the parameters with stochastic models is to use 
filtering methods for constructing the likelihood (see, e.g., [2, 7, 11] for basic 
references and implementation, or [8] for recent variant). By using the Kalman filter, 
the idea is to build the marginal filter likelihood from the prediction residual rz, and 
its error covariance matrix Cj, at each filtering time step k. 

The basic linear Kalman filter is written as a pair 


Xk = Mgxx-1 + &;, (1) 
yer = Hx + €x, (2) 


where xx is the state and yx is the measurement vector. Matrix My is the linear state- 
space model, and matrix Hy is the observation operator that maps from the state 
space to the observation space. The error terms &, and e, are typically assumed zero 
mean and Gaussian: &, ~ N(0, Qx) and ex ~ N(O, Rx). This dynamical system is 
solved using Kalman filter formulas (see, e.g., [11]). 

Given a set of observation y).x and the parameter vector 0, the marginal filter 
likelihood is written as 


NI 


[FRCP tre + log IC; )): (3) 


P(Y1:K|9) = exp (- 
k=1 


where | - | denotes the matrix determinant. Here the prediction residual and its error 
covariance matrix are calculated by the formulas 


re = ye — Hex, (4) 
Cl =H, Ce" + Ry, (5) 


where x is the prior estimate computed from the previous state = Myxy", 


and Cc = M, Cet, M} + Q; is the respective error covariance matrix. Note 
that the normalizing “constant” |C;| has to be included, since it depends on the 
parameters via the prediction model. 

This approach is well established in the framework of linear time series or linear 
SDE systems, where the additive model noise is known or may be estimated, as 
one of the unknowns in the vector @. In case the drift part of the system (1) is 
nonlinear, one still may use the approach in the extended Kalman filter (EKF) 


28 H. Haario et al. 


form, based on the approximation by linearization. Often the EKF approach is also 
applied to filtering of deterministic systems. In that setting the model error term 
is rather postulated and interpreted as a measure of bias. The covariances Q and 
R represent then our trust on the model and data, respectively, previous work [5], 
motivated by closure parameter estimation in climate research, is an example of this 
approach. A related option is to employ ensemble filtering. In [12] this approach 
was employed in order to tune the ensemble prediction system parameters. It was 
observed, however, that the method resulted in a highly stochastic cost function that 
prevented a successful application of parameter optimization algorithms. Moreover, 
the tuning parameters of the filter itself may bias the model parameter estimation, 
see [6]. Recently, some additional criticism toward using the filtering for estimating 
the parameters in real-world applications (other than finance) has been presented 
see [10]. 

Next, we present the method developed in [4] for deterministic chaotic systems. 
While computationally more demanding, it is free of the pitfalls listed above, and 
can be applied to stochastic systems more general than the class of additive noise 
given by (1). 


2.2. Correlation Integral Likelihood 


In this section we briefly summarize the correlation integral likelihood method used 
for creating a likelihood for complex patterns [4]. 

Let us use the notation s = s (0, x) for a state vector s that depends on parameters 
@ and other inputs x such as, e.g., the initial values of a dynamical system. We 
consider two different trajectories, s = s (9, x) ands = s 6 : x); evaluated at VN ¢ N 
time points 4;, i = 1 : N, with explicit dependency on the respective initial and 
parameter values. For R € R, the modified correlation sum is defined as 


~ I 
C(R,N,8,x,0,%) = =D | #((Isi — §8)| < R). (6) 
inj 


In the case 9 = 6 and X — x the formula reduces to the well known definition 
of correlation sum, the Correlation Integral is then defined as the limit C(R) = 
limy+oo C(R, N), and the Correlation Dimension v as the limit 


v= lim logC(R)/log(R). 
R->0O 


In numerical practice, the limit R — 0 is approximated by the small scale values 
of the ratio above, by the log-log plot obtained by computing log C(R) at various 
values of log R. 
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However, we do not focus on the small-scale limit as in the above definition, but 
rather use the expression (6) at all relevant scales R to characterize the distance 
between two trajectories. For this purpose, a finite set of decreasing radii R = 
(Rx), k = 1,..., M, is chosen. The radii values Rx are selected so as to involve 
both small and large scale properties of the trajectory samples. Typically, the radii 
are chosen as Ry = b~*Ro, with Ro = max;, j Is; — 3; | or somewhat larger to 
ensure that all the values are inside the largest radius. The values of M and b should 
be chosen in a way that Ry is small enough. For more details see [4]. 

Consider now the case with given data s;, which corresponds to the case of a 
fixed but unknown model parameter vector, 0 = 0 = 00. We select two subsets s 
and § of size N from the data (see more details below). If we fix the radii values 
R = (Rx), k = 1,...,M the expression (6) defines a M dimensional vector 
with components yz = C(Rx, 60, x). A training set of these vectors is created by 
repeatedly selecting the subsets s and §. The statistics of this vector can then be 
estimated in a straightforward way. 

Indeed, the expression (6) is an average of distances, so by the Central Limit 
Theorem it might be expected to get Gaussian. More exactly, each expression y = 
(yx) gives the empirical cumulative distribution function of the respective set of 
distances. The basic form of the Donsker’s theorem tells that empirical distribution 
functions asymptotically tend to a Brownian bridge. In a more general setting, close 
to what we employ here, the Gaussianity was established by Borovkova et al. [1]. 

At a pseudo code level the procedure can be summarized as follow: 


e Using the measured data, create a training set of the vectors y for fixed radii 
values (Rx) by sampling data at measurement times (f;). 

¢ Create the empirical statistical distribution of the training set y as a Gaussian 
likelihood, by computing the mean mw and the covariance & of the training set 
vectors. 

¢ Find the maximum likelihood model parameter 09 of the distribution 


1 
Pog (8, x) ~ exp —s(u — YOO, x)" =u!(u — y@,x)) 


e Sample the likelihood to find those model parameters 0 for which the vector 
y = C(09; x; 6; x) belongs to the distribution N(w, X). 


The first step will be discussed more in detail in the examples below. Note that in 
[4] we assumed a parameter value 09 given and created the training data by model 
simulations, while here we start with given data, create the training set from subsets 
of data, and proceed to estimate a maximum likelihood parameter value 00. 


Remark In all the cases the prior distribution is assumed to be flat uniform. 


30 H. Haario et al. 
3 Numerical Experiments 


The main objective of this section is to modify the Correlation integral likelihood 
(CIL) method for identifying SDE system parameters. The new version of the 
method is compared with the filter likelihood results. After this validation the 
approach is applied to a more complex case. 


3.1 Ornstein-Uhlenbeck with Modification for Dynamics 


We start with a basic SDE example, the Ornstein-Uhlenbeck (OU) process model. 
We use it as a benchmark to verify that the CIL method is able to produce results 
comparable to standard filter likelihood methods in a setting where these classical 
methods perform perfectly well. The OU process equation is given by 


In the numerical simulations, we use 0 = 10 and o = 1.5 as the ‘true’ values. 
For simplicity, the mean value of the process is set to zero (but all the results and 
conclusions are valid for a non-zero mean as well). We create a data signal of 3000 
points on the time interval [0, 30], with initial value X = 0. 

Figure | exhibits the signal used as data, obtained by integration of (7) using the 
Euler-Maryama method, with a time step dt = 0.01 and using a fixed Gaussian 
N(0,o7) as the diffusion part. The figure presents three different realizations. 
Note that essentially the same results as those given below were obtained by any 
realizations used. 

Let us first apply the CIL method in the basic form. To create the sample sets s; 
we randomly select 1500 of the data points of the signal in Fig. | and use the rest 
of the points as s; to get the set of distances needed in (6). This process is repeated 
around 2000 times to get a representative set of the feature vectors y. The likelihood 
is then obtained by computing the mean and covariance of the training vectors y, 
and the Normality of the vectors can be verified by the usual x? test. 

Next, we find the distribution of the model parameters 0,0 that follows this 
distribution by creating a MCMC chain of length 20,000 using adaptive Metropolis 
[3]. The result in Fig. 2 shows, however, that the model parameters are not identified 
by this likelihood. This situation is different from those reported in [4], and several 
unpublished cases, for chaotic systems, where the same likelihood construction is 
able to identify the model parameters. 

We conclude that too much information is lost in the mapping from data to 
the feature vectors y. Indeed, this is not surprising in view of the fact that only 
the distances between randomized data points is considered, while the order or 
differences between consecutive points is lost. A trivial example is given by any 
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Fig. 1 Ornstein-Uhlenbeck signal used for the experiments 


vector or random points: sorting it in increasing order gives a definitely different 
signal, but with just the same set of points and distances between them. 

Intuitively, the mean reverting dynamics is lost here, so some additional modifi- 
cation of the method is needed. The large posterior in Fig. 2 exhibits only what it is 
programmed to do: signals whose distance distributions remain close, which in this 
case does not characterize the signals. The feature vector can be modified in various 
ways. Here we present the impact of extending it in the obvious way: we include the 
differences between consecutive points. We create the feature vectors separately for 
the signal and for the differences. The final feature vector is created by concatenating 
the curves, and the Gaussianity of the combined vector can be tested by the x? test. 
Figure 2 illustrates the posterior obtained using three different levels of information: 
only the data signal, only difference between consecutive points, and both together. 
We see how the first two are not enough, while the posterior of the extended case, 
practically the intersection of the two other posteriors, significantly improves the 
identification. 

Next, we compare the Correlation Integral Likelihood results with that obtained 
by filter likelihood estimation based on Kalman filtering. We use the same data 
signal as above, using all the points X;, k = 1, ..., 3000 as exact measurements (no 
noise added) of the state vectors, and create MCMC samples of the likelihood given 
by the expression (3). The comparison presented in Fig. 3. As expected, the filtering 
method is more accurate with this amount of data (we use every Euler-Maryama 
integration step as data for filtering), but the results by CIL are comparable. 
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Fig. 2. The use of both state and difference information leads to a posterior (yellow) that is located 
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Fig. 3 Illustration of the results obtained by comparing CIL with the Filter likelihood method in 
parameter estimation for a zero mean Ornstein-Uhlenbeck 
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Remarks In the above examples we have used the known value of @ as the starting 
point for the MCMC sampling. However, as the likelihood is created by the data 
signal, we can equally well use it as the cost function to estimate 0 first. We omit 
here the details of this step. 


Note that there is a difference in computational times of the two methods, in this 
particular case they are approximately 20 min for CIL and around 6 min for KF. The 
difference is basically due to the additional computation of the distances needed for 
CIL. 

Note that using a larger time step between data points would decrease the 
accuracy of the KF estimate. However, it does not impact the CIL estimate, as it 
is based on independent samples X; in random order, not on predicting X;+1 by X;. 

Finally, we note that the use of the present modification, including the system 
‘dynamics’ by signal differences, is not limited to the OU example. Rather, it can 
be used generally to improve the model parameter identification of both SDE and 
deterministic chaotic systems. However, a more detailed discussion is outside the 
scope of this work. 


3.2 Stochastic Chaos 


Here we study the CIL approach for chaotic dynamics, extended with stochastic 
perturbations. Now the stochasticity is no more of the additive form (1) but is 
contained in the model equations in a nonlinear way. The specific forms of the 
perturbations discussed here come from meteorology. In the so called Ensemble 
Prediction Systems (EPS) an ensemble of weather predictions, with carefully 
perturbed initial values, is launched together with the main prediction. The motive 
is to create probabilistic estimates for the uncertainty of the prediction. However, 
it is difficult to create a spread of the ensemble predictions that would match the 
observed uncertainty; the spread of the model simulations tends to bee too narrow. 
To increase the spread the so called stochastic physics is employed: the right hand 
side of the model differential equation is multiplied by a random factor (close to one) 
at every integration step. More recently, so called stochastic parametrization is used 
in addition: certain model parameters are randomized likewise at every integration 
step of the system. For more details of these methods see [9]. 

As acase study for the parameter estimation with stochastic physic and stochastic 
parametrization a classical chaotic attractor, the Rossler system, is chosen. We 
give the Rossler system in the form where the stochastic physics is introduced by 
the multiplicative factors 1 + cxé, and the model parameters a, 6, y are likewise 
replaced by perturbed terms a + cye, etc., k = 1 : 6,€ ~ N(O, 1). The system 
reads as 


X = (1+ ce) (-¥ —Z) 
¥ = (1 + cr€2) (X + (@ +0363) ¥) (8) 
Z = (1 + c4ea) ((B + es€5) + Z(X — (y + cv€6))) 
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Fig. 4 The X component of the Rossler model with four different options for stochasticity 


with ‘true’ parameters a = 6 = 0.2 and y = 5.7. The magnitudes c, were chosen 
so that the maximum relative error would not exceed 40% in any of the cases. 

Figure 4 shows the time evolutions of one of the components, the values of X 
for different combinations of added stochasticity. Each plot consists of 80 runs with 
slightly perturbed initial values. We see that the interval of predictable behavior 
shrinks to almost one half of that of deterministic chaos when both types of 
perturbations are added. 

The task of parameter estimation is now to try to find the distribution of the 
mean value of each of the perturbed parameters. The construction of the likelihood 
is performed via the standard procedure: from a long enough data signal (here, 
produced by simulating (8)) we sample subsets to calculate the distances, and repeat 
this for a number of times to be able to empirically determine the statistics of 
the feature vectors. Again, the Gaussianity of the statistics can be verified. Both 
a maximum likelihood parameter estimate, and the subsequent MCMC sampling 
for the posterior can then be performed. 

For the examples we create the data by simulating (8) over a total time interval 
[0, 120,000] and select data points at frequency shown in Fig.4 with the green 
circles. To get one feature vector y we select two disjoint sets of 2000 consecutive 
data points. To create the statistics for y we repeat this procedure for around 1800 
times. The number of radius values used was 10. 

The results of the runs for different setting of the perturbations are given in Fig. 5. 
We can conclude that the approach performs as expected: the more stochasticity in 
the model, the wider are the parameter posteriors. However, in all cases we get 
bounded posteriors, and the algorithm performs without any technical issues. 
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Fig. 5 Parameter posteriors for three different options of stochasticity for the Rossler model 


4 Conclusions 


In this work we have applied the recently developed Correlation Integral Likelihood 
method to estimate parameters of stochastic differential equation systems. Certain 
modifications are needed to get satisfactory results, comparable to those achieved 
by standard filter likelihood methods for basic SDE systems. But the main focus is 
on situations where the standard methods are not available, such as the stochastic 
physics and parametrizations employed in meteorology for uncertainty quantifica- 
tion. Several extensions of the approach are left for future work. 
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A Set Optimization Technique M®) 
for Domain Reconstruction od 
from Single-Measurement Electrical 

Impedance Tomography Data 


Bastian Harrach and Janosch Rieger 


Abstract We propose and test a numerical method for the computation of the 
convex source support from single-measurement electrical impedance tomography 
data. Our technique is based on the observation that the convex source support is the 
unique minimum of an optimization problem in the space of all convex and compact 
subsets of the imaged body. 


1 Introduction 


Electrical impedance tomography is a modern non-invasive imaging technology 
with the potential to complement computerized tomography in treatments like 
pulmonary function diagnostics and breast cancer screening. From a mathematical 
perspective, the reconstruction of the exact conductivity within the imaged body 
from electrical impedance tomography data amounts to solving a strongly ill-posed 
inverse problem. 

The difficulty of this problem can partly be avoided by noting that one is usually 
not interested in the conductivity as such, but only in the domain where it differs 
from the conductivity of healthy tissue. A technique introduced in [5] for scattering 
problems and adapted to electrical impedance tomography later in [3] takes this 
approach one step further by considering a convex set, called the convex source 
support, which contains information on the desired domain, but can be computed 
from a single measurement. 
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We propose a numerical method for the computation of the convex source support 
from electrical impedance tomography data. It is based on the observation that this 
particular set is the unique minimum of an optimization problem in the space of 
all convex and compact subsets of the imaged body. In Sect. 2, we recall the notion 
of the convex source support, and in Sect.3, we formulate the above-mentioned 
optimization problem and manipulate its constraint into a convenient form. In 
Sect. 4, we introduce Galerkin approximations, which are spaces of polytopes with 
fixed outer normals, to the space of all convex and compact subsets of a given 
Euclidean vector space. These spaces possess a sparse and unique representation 
in terms of coordinates. In Sect.5, we discuss how the derivatives of the objective 
function and the constraint of our optimization problem can be computed efficiently. 
In Sect. 6, we gather all the above ingredients and solve the optimization problem 
numerically using a standard interior point method on the Galerkin approximation, 
which yields a numerical approximation of the convex source support. 

This paper is a report on work in progress, which aims to present ideas rather than 
a complete solution of the problem. In particular, we assume that we can measure 
the potential on the entire boundary of the imaged body, which is not possible in 
real-world applications, and we neither include an error analysis nor stability results 
for the proposed algorithm. 


2 The Convex Source Support in Electrical Impedance 
Tomography 


We consider the following idealistic model of the EIT problem. Let 22 C R¢, d>2, 
be a smoothly bounded domain describing the imaged body, let o € L®(2) be the 
conductivity within 92, and let g € IZ (082) be the electric current applied to 022, 
where L2 (8) denotes the subspace of L*(9Q) with vanishing integral mean on 
082. Then the electrical potential u € H. ie (2) solves 


V-(o(x)Vu(x)) = 0, x EQ, ; 
TOuUlag(x) = g(x), x € 0, ” 


where v is the outer normal on 092, and Ha) is the subspace of H! (Q2)- 
functions with vanishing integral mean on 02. 

Our aim is to find inclusions or anomalies in {2 where the conductivity o differs 
from a reference conductivity value oo (e.g. that of healthy tissue) from measuring 
the electric potential u|aq on 02. To simplify our exposition we assume o9 = 1 
throughout this work. More precisely, we aim to find information on supp(o — 09) 
from the data (u — uo)|a@, where uo solves (1) with the same Neumann boundary 
data g, and o replaced by oo. This is usually referred to as the problem of single 
measurement EIT since only one current g is applied to the patient. 
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Now we introduce the convex source support, following [3] and [5]. First note 
that since u and uo are solutions of (1) with conductivities o and og and identical 
Neumann data g € L202), their difference w := u — uo solves the equation 

Aw =div(l—o)Vu)in2, dw=OondM, (2) 
with a source term satisfying 


supp((1 — 7)Vu) C supp( — 0). (3) 


This motivates the following construction of the convex source support. Let us 
define the virtual measurement operator 


LiL (OY = 12002), FS Wye, 
where w € H (2) solves 
Aw=divFinQ, d,w=OondmQ. 


Given a measurement f = (U — UQ)py9 € L2(8Q), the convex source support of 
problem (1) is defined by 


€f = {| co(supp(F)). 


LF=f 


which is the intersection of the convex hulls of all supports of sources that could 
possibly generate the measurement f. By Eqs. (2) and (3), 


6 (U — Uo)\pq C co(supp((1 — 7) Vu)) C co(supp(o — o9)), 


which means that the convex source support provides coarse, but reliable informa- 
tion about the position of the set supp(o — o9). In fact, much more is known. The 
following theorem, e.g., can be found in [3]. 


Theorem 1 We have © f = Gif and only if f = 0, and for every € > 0, there exists 
Fee L?(92)4 such that LF. = f and dist(co(supp(F.)), @ f) < €. 


3 An Optimization Problem in HR!) 


For given data f € Le ($2), we recast the computation of the convex source support 
as a minimization problem 


vol(D) = min! subjectto De€ %( RY), Cf CDCQ (4) 
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in the space AA(R®) of all nonempty convex and compact subsets of R*, which 
obviously has the unique solution D* = @ f. To solve the problem (4), we mainly 
need a handy criterion to check whether @ f C D. 

By Theorem I, we have @ f C int D if and only if there exists F € L*()¢ with 
supp(/’) C Dand LF = f. In other words, we have to check whether f € Z(Lp), 
ie. whether f is in the range of the operator Lp, where 


Lp: L7(D)? > L202), Fe Wha: 
and w € H} (2) solves 
Aw=divFinQ?, dw=OondmM. 


Proposition 1 [f the interior of D C & is not empty, then Lp is a compact linear 
operator with dense range, and 


f ¢A(Lp)  ifandonly if lim |R? fl =o. 


where RP f := (L4Lp +al1)!L%f. 
Proof Lp is the concatenation of the linear bounded solution operator and the linear 
compact trace operator from H. ie (92) to L2(a) and thus linear and compact. The 
adjoint of L p is given by (see [4, Lemma 2]) 

Li: L2(82) > L?(D)", gt Vuolp, 
where vo € H (D) solves 

Avg =0in 2, dA,volag =yondIAL. 

By unique continuation, L7, is injective and thus Lp has dense range. This also 


implies that the domain of definition of the Moore-Penrose inverse Le (cf. [2, 
Def. 2.2]) is given by 


Y(LS) = RLp) + Lp) = AL). 


Since Re is a linear regularization (the Tikhonov regularization, cf. [2, Section 5]), 
and a simple computation shows that 


sup ||LpR? || < 1, 


a>0 


it follows from standard regularization theory (cf., e.g., [2, Prop. 3.6]) that 


lim Ref =Lpf if f ¢ WL5) = #Lp), 
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and that 


lim Re fll =00 if f ¢ P(L5) = AL). 


This proves the assertion. 


To implement Proposition |, we therefore have to check whether the quantity 


(R? fi? = LhLd tal) LS Ff? = [LLL +at) | FI? 
= (LoL, tal) 'LpLi(LpLly +al)'f, f) 


remains bounded as a —> 0. Writing Mp := LpLy : E2@2) => L2 (dQ), we 
obtain the convenient representation 


RP FI? = (Mp +a1)"!Mp(Mp +a1)"'f, f). (5) 


Fix an orthonormal basis (gj); of L2 (AQ). The characterization of Li in [4, 
Lemma 2] shows that 


(Mpgj;, o) = (Lg), Linge) = | Vud- Vuk dx, (6) 
D 


where ud solves 
Aus =O0mQ, dud = gj on 02. (7) 


Note that the integrands Vui, . Vuk do not depend on D and hence can be 
precomputed. Since 


/ Vus . Vui dx = i: dyui, : us ds 
D aD 


by the Gauf-Green theorem, even more computational effort can be shifted to the 
offline phase, provided the sets under consideration possess essentially finitely many 
different normals, which is the situation we consider in Sect. 4 and what follows. 

Proposition | gives a mathematically rigorous criterion to check whether a set 
D contains the convex source support. In the following we describe a heuristic 
numerical implementation of this criterion and test it on a simple test example. Let 
us stress that we do not have any theoretical results on convergence or stability of the 
proposed numerical implementation, and that it is completely unclear whether such 
an implementation exists. Checking whether a function lies in the dense range of 
an infinite-dimensional operator seems intrinsically unstable to discretization errors 
and errors in the function or the operator. Likewise, it is unclear how to numerically 
check whether the sequence in Proposition | diverges or not. 
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In other words, the following heuristic numerical algorithm is motivated by a 
rigorous theoretical result but it is completely heuristic and we do not have any 
theoretical justification for this algorithm. Since, to the knowledge of the authors, 
no convergent numerical methods are known for the considered problem, we believe 
that this algorithm might still be of interest and serve as a first step towards 
mathematically rigorously justified algorithms. 

To heuristically check, whether || RO T || — ©, we fix suitable constants a, C > 
0 and N &€ N. Consider the finite-dimensional subspace Vy := span(¢1,..., @Nn) 
of L2(0Q) and the corresponding L? orthogonal projector Py : L2 (a) —> Vn. 
Instead of Mp, we consider the truncated operator M a := PyM\|vy : Vw > Vn, 
which satisfies 


(Mp oj, Pk) = (PvMgj, ¢K) = (Mpgj, ex) for 1< j,k <N, 
so that formula (6) holds for M Fu as well. We define 
RP yu? = (MS tal) 'Mp(Mp +a1) | Pyv, Pyv) forall v € L3 (a2) 
and note that 
Rov f > ROS as N>oo 


follows from a discussion, which involves a variant of the Banach Lemma. 
Therefore, the use of the criterion 


Rew fll? <C 
instead of | RP F || — 00 is well-motivated, and we solve 
vol(D) = min! subjectto D¢ (IR), DC Q, |REv fll’ <C. (8) 


with, e.g., a = 10~* and C = 10° instead of (4). 


4 Galerkin Approximations to H,(R?) 


We outline a setting for a first-discretize-then-optimize approach to numerical 
optimization in the space .%(IR7), which we use to solve problem (8). To this end, 
we define Galerkin subspaces of .%(IR*) in terms of polytopes with prescribed 
sets of outer normals. These spaces have good global approximation properties (see 
Proposition 2), they possess a unique representation in terms of few coordinates, and 
their sets of admissible coordinates are characterized by sparse linear inequalities. 
A theory of these spaces in arbitrary dimension is work in progress. 
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Fix a matrix A € R”*? with rows a. i= 1,...,m, where aj € R’, |laj|l2 = 1 
for all i and a; ¥ a; for alli, j with i # j. For every b € R”, we consider the 
convex polyhedron 


Oa i= {x € R?: Ax < 5}, 
and we define a space G4 C .%(IR”) of convex polyhedra by setting 
Gs :={Oap:b€ R™}\ {O}. 


The choice of these spaces is motivated by an approximation result from [8]. Recall 
the definition of the one-sided Hausdorff distance 


dist : %(IR?) x %(IR*) > Ry, dist(D, D’) := sup inf |x — x’ ||2. 
xeD*'ED! 


Proposition 2 Assume that the matrix A € IR"? satisfies 


S:= max  dist({x},{al,...,a7}) <1. (9) 
xeR?, ||x|l2=1 


Then the associated space Ya consists of convex polytopes, and for all D € %(R*), 
there exists Qa.» € Ga such that D C Qa.» and 


82 
dist(D, {0}). 


dist »D)< 
ist(Qa,», D) <= =— 


Hence, if the matrix A is augmented in such a way that 6 > 0 asm — oo, then Y,4 
converges to (IR?) uniformly on every bounded subset of HR’). 

It is, at present, not entirely clear how to represent the spaces Y4 in terms of 
coordinates. There are b € R™ with Q4,, = @, and two different vectors b, b’ € 
IR” may encode the same polytope Q4,, = Q4.p’. In our concrete optimization 
problem, the constraint @ f C D will enforce Q4, 4 W. For the time being, we 
treat the second issue by forcing all hyperplanes {x € R? : ap, x = by}, k = 
1,...,m, to possess at least one common point with Q4 ». This approach will be 
made rigorous in the future. 


Definition 1 We call the set @4 of all b € R” satisfying 


Ai (i) So. i io) en =) ee eee ) = 
bo yan by 
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with 


the set of admissible coordinates. 


Note that the inverse matrices above exist when 6 < 1| as required in Proposi- 
tion 2. Hence it is easy to assemble the sparse matrix 


1 —pi,2 PAA 
—p21 1 p22 


—Pm-1,1 1 —Pm-1,2 
—Pm,2 —Pm,1 1 


which gives rise to the following characterization of the set @4 C R”. 


Lemma 1 The set of admissible coordinates can be written as 
6, = {be R”: Hab < 0}. 


All in all, we replaced the relatively inaccessible space .% (IR) with a Galerkin 
subspace %, that is parametrized over a set @4 C IR” of coordinates, which, in 
turn, is described by a sparse linear inequality. For the practical computations in this 
paper, we fix the matrix A = (qa,..., dm)! given by 


ag = (cos(2k1/m), sin(2kx/m))", k=1,...,m, 


which is probably the best choice in the absence of detailed information on the set 
to be approximated. As we will have 2 = B, (0) in our computational example, we 
will replace problem (4) with the fully discrete optimization problem 


vol(Q4,p) = min! 
0 (10) 
subjectto be R”, Hab <0, b<1, Ik Fl <C. 
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5 Gradients of Functions on 4,4 


The objective function D +> vol(D) and the constraint D bh RP no \|? are both 
given in terms of integrals of a real-valued function over the set D. The evaluation 
of these integrals is straight-forward and efficient. The efficient evaluations of the 
gradients of both integrals with respect to coordinates requires some preparation. 
We follow [6, Lemma 2.2] and [7, Theorem 1]. 


Proposition 3 Let b € @4, letk € {1,...,m}, and let g , be the facet 
ON, = QaA,b N{x € R? : Gi x = bx}. 


If we assume that voln(Qa,p) > 0 and vol) (O*, p) > 9, then for any continuous 
function h : R? — R we have 


d 
os  neoer= fo, h(é)dé. 


A,b 


The above proposition shows that whenever Q 4 » is not degenerate, we have 
Vpvolo(Qa,b) = (voli(O4.p), +--+ VOL(Q%.5))” - 


To compute Vp] Freee » f||?, we need the following lemma. The construction of the 
matrices P, S and U reduced the costs for the computation of the desired derivative. 


Lemma 2 Let ¢ > 0, let M : (—e,e) > RN*%, y BH M(y), be differentiable 
with M(y) symmetric and M(y) + al invertible for all y € (—e,&). Using the 
abbreviations 
X:=M+al, ¥ := X7'M’xX"! andZ := MX", 
we find that 
(x tux) Ssy74 yr (rz)'. 


The proof is elementary and therefore omitted. An application of Lemma 2 to the 
matrix representation of M Da 7 yields 


Re FIP = = MS, +aly'Mh (Ms +al)'f, f) 


= ((-%Z + Ye — (%Z)")F, f) 
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with the abbreviations 


d 
so N __ yl N —1 as N —1 
X= Mo,, t+al, Yi: X an eae and Z := Mo, ,* ; 
where 
(uN dij = Vui,- Vul dé 
l 
dbx Oa pt ok, 0 0 


by Proposition 3. Thus we obtain a formula for Vp] Rowe f ||? that is not only more 


precise than a numerical approximation by finite differences, but also much cheaper 
to compute, because the area of integration is just a lower-dimensional surface. 


6 A First Numerical Simulation 


We test our numerical algorithm on a simple 2d example, where all quantities are 
known explicitly and the algorithm can be observed under controlled conditions. Let 
§2 = B (0) be the unit circle and let o9 = 1. We consider a point inhomogeneity 
which leads to a difference potential (cf. [1]) 


that solves the partial differential equation 
Aw=7n-V6~in 2, oodyujag = 9, 


where z* is the location of the point inhomogeneity, and n € R?, ||n|l2 = 1 isa 
dipole orientation vector depending on the applied current pattern. Using a standard 
smoothing argument, it is easily checked (see, e.g., [4]) that for each open set O 
containing z* there exists F' € L?(O)2 so that 


Aw = div F. 
Hence the convex source support of the difference measurement w|ggq is the 
inhomogeneity location z*. In our example we used z* = (3, =a and 7 = 
(0)? 


In the following computations, it is convenient to switch between standard 
coordinates (x1, x2) and polar coordinates (r, €). Consider the basis 


1 1 
92j(§) = ioe 92j+1§) = we sin(jé), jf €Ni, 


A Set Optimization Technique for Electrical Impedance Tomography 47 
of i (082). Since the Laplace operator satisfies 


1 
Uxyxy + Uxox. = Ure + Pas =F Pgs 


it is easy to see that the corresponding solutions of problem (7) are 


uo! (r,€) = cos(jé)ri, ug! *'(r,€) = 


1 1 ‘ 
sin(jé)r/, j € Nj. 
ade = (j&)r J 1 


Since the gradient satisfies 


J 


1, ‘ 1 
Ux, =cos&-u,y+—sin€-ug, ux, = sin€ -u, + —cosé - ue, 
r r 
we have explicit representations 


ef Gunn, 22a aig < 1h 
—u => — cos ; —Uu = —-—— sin — > 
dx, ° J J dxy ° Jf It q 


2 ree a ne : 
— =——=s Ne), —— = — — 1é). 
aa! Je sin((j + 1)&) a8 Te cos((j — 1)é) 
Now we fix the matrix A = (aj, ..., ag)" given by 


ak = (cos(k/4), sin(kz /4))" , k=1,...,8, 


and solve optimization problem (4) approximately by applying Matlab’s interior 
point method to problem (10) with initial value bb = (3, er 3)! and N = 6, 
computing values and gradients of the objective b +> vol2(Q4.p») and the constraint 


bR [Ree wiacll? as in Sect.5. The results and the computation times on an 


ordinary desktop computer are displayed in Fig. 1. 
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step 1, time 0.037279 sec step 2, time 0.037482 sec 
1 1 
0.5 0.5 
0 0 
-0.5 -0.5 
-1 “1 
-1 #05 0 05 1 1 05 0 05 1 
step 4, time 0.135292 sec step 8, time 0.247123 sec 
1 1 
0.5 0.5 
0 0 
-0.5 -0.5 
“1 1 
1 -05 0 05 1 -1 -05 0 05 1 
step 12, time 0.337200 sec step 14, time 0.579555 sec 
1 1 
0.5 0.5 
0 0 
-0.5 -0.5 
-1 “1 
-1 05 0 05 1 1 05 0 O05 1 


Fig. 1 Selected iterates of Matlab’s interior point optimization tool applied to (10) with data 
specified in Sect. 6. The current iterate is highlighted in black. The position of the dipole is the 
center of the red circle 
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Local Volatility Calibration by Optimal M®) 
Transport od 


Ivan Guo, Grégoire Loeper, and Shiyi Wang 


Abstract The calibration of volatility models from observable option prices is a 
fundamental problem in quantitative finance. The most common approach among 
industry practitioners is based on the celebrated Dupire’s formula, which requires 
the knowledge of vanilla option prices for a continuum of strikes and maturities 
that can only be obtained via some form of price interpolation. In this paper, we 
propose a new local volatility calibration technique using the theory of optimal 
transport. We formulate a time continuous martingale optimal transport problem, 
which seeks a martingale diffusion process that matches the known densities of an 
asset price at two different dates, while minimizing a chosen cost function. Inspired 
by the seminal work of Benamou and Brenier, we formulate the problem as a convex 
optimization problem, derive its dual formulation, and solve it numerically via an 
augmented Lagrangian method and the alternative direction method of multipliers 
(ADMM) algorithm. The solution effectively reconstructs the dynamic of the asset 
price between the two dates by recovering the optimal local volatility function, 
without requiring any time interpolation of the option prices. 


1 Introduction 


A fundamental assumption of the classical Black-Scholes option pricing framework 
is that the underlying risky asset has a constant volatility. However, this assumption 
can be easily dispelled by the option prices observed in the market, where the 
implied volatility surfaces are known to exhibit “skews” or “smiles”. Over the 


I. Guo - G. Loeper 
School of Mathematical Sciences, Monash University, Clayton, VIC, Australia 


Centre for Quantitative Finance and Investment Strategies, Monash University, Clayton, VIC, 
Australia 


S. Wang (4) 
School of Mathematical Sciences, Monash University, Clayton, VIC, Australia 
e-mail: Leo.Wang @monash.edu 


© Springer Nature Switzerland AG 2019 51 
D. R. Wood et al. (eds.), 2017 MATRIX Annals, MATRIX Book Series 2, 
https://doi.org/10.1007/978-3-030-04161-8_5 


52 I. Guo et al. 


years, many sophisticated volatility models have been introduced to explain this 
phenomenon. One popular class of model is the local volatility models. In a local 
volatility model, the volatility function o(t, S;) is a function of time ¢ as well as the 
asset price S;. The calibration of the local volatility function involves determining 
o from available option prices. 

One of the most prominent approaches for calibrating local volatility is intro- 
duced by the path-breaking work of Dupire [6], which provides a method to recover 
the local volatility function o(t, s) if the prices of European call options C(T, K) 
are known for a continuum of maturities T and strikes K. In particular, the famous 
Dupire’s formula is given by 


OD + ,K 2G 0 


o°(T, K) = , (1) 


K2 @2C(T,K) 
2 aK? 


where ju; is a deterministic function. However, in practice, option prices are only 
available at discrete strikes and maturities, hence interpolation is required in both 
variables to utilize this formula, leading to many inaccuracies. Furthermore, the 
numerical evaluation of the second derivative in the denominator can potentially 
cause instabilities in the volatility surface as well as singularities. Despite these 
drawbacks, Dupire’s formula and its variants are still used prevalently in the 
financial industry today. 

In this paper, we introduce a new technique for the calibration of local volatility 
functions that adopts a variational approach inspired by optimal transport. The 
optimal transport problem was first proposed by Monge [10] in 1781 in the 
context of civil engineering. The basic problem is to transfer material from one 
site to another while minimizing transportation cost. In the 1940s, Kantorovich 
[8] provided a modern treatment of the problem based on linear programming 
techniques, leading to the so-called Monge-Kantorovich problem. Since then, the 
theory of optimal transport has attracted considerable attention with applications 
in many areas such as fluid dynamics, meteorology and econometrics (see, e.g., 
[7] and [14]). Recently, there have been a few studies extending optimal transport 
to stochastic settings with applications in financial mathematics. For instance, 
Tan and Touzi [13] studied an extension of the Monge-Kantorovich problem for 
semimartingales, while Dolinsky and Soner [5] applied martingale optimal transport 
to the problem of robust hedging. 

In our approach, we begin by recovering the probability density of the underlying 
asset at times fo and t; from the prices of European options expiring at fo and fy. 
Then, instead of interpolating between different maturities, we seek a martingale 
diffusion process which transports the density from fp to t;, while minimizing a 
particular cost function. This is similar to the classical optimal transport problem, 
with the additional constraint that the diffusion process must be a martingale driven 
by a local volatility function. In the case where the cost function is convex, we find 
that the problem can be reformulated as a convex optimization problem under linear 
constraints. Theoretically, the stochastic control problem can be reformulated as an 
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optimization problem which involves solving a non-linear PDE at each step, and the 
PDE is closely connected with the ones studied in Bouchard et al.[2, 3] and Loeper 
[9] in the context of option pricing with market impact. For this paper, we approach 
the problem via the augmented Lagrangian method and the alternative direction 
method of multipliers (ADMM) algorithm, which was also used in Benamou and 
Brenier [1] for classical optimal transport problems. 

The paper is organized as follows. In Sect. 2, we introduce the classical optimal 
transport problem as formulated by Benamou and Brenier [1]. In Sect.3, we 
introduce the martingale optimal transport problem and its augmented Lagrangian. 
The numerical method is detailed in Sect.4 and numerical results are given in 
Sect. 5. 


2 Optimal Transport 


In this section, we briefly outline the optimal transport problem as formulated by 
Benamou and Brenier [1]. Given density functions pg, p1 : R? > [0, co) with 
equal total mass fia po(x)dx = fia P1(x)dx. We say that a map s : R¢ > R? is an 
admissible transport plan if it satisfies 


/ ACS : poliax, (2) 
xeA s(x)EA 


for all bounded subset A C R?. Let 7 denote the collection of all admissible maps. 
Given a cost function c(x, y), which represents the transportation cost of moving 
one unit of mass from x to y, the optimal transport problem is to find an optimal 
map s* € & that minimizes the total cost 


int [ c(x, 8(X)) po(x)dx. (3) 
seT Jipd 


In particular, when c(x, y) = |y — x|* where | - | denotes the Euclidean norm, this 
problem is known as the L* Monge-Kantorovich problem (MKP). 

The L? MKP is reformulated in [1] in a fluid mechanic framework. In the time 
interval t € [0, 1], consider all possible smooth, time-dependent, densities p(t, x) > 
0 and velocity fields v(t, x) € R%, that satisfy the continuity equation 

a(t, x) + V- (et, x)vt, x)) = 0, Vt € [0, 1], ¥x € RY, (4) 


and the initial and final conditions 


p(0O,x)= po, pUl,x) = pr. (5) 
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In [1], it is proven that the L? MKP is equivalent to finding an optimal pair (p*, v*) 
that minimizes 


1 
in [ i: p(t, x)|v(t, x)/*dtdx, (6) 
pv JRa JO 


subject to the constraints (4) and (5). This problem is then solved numerically in 
[1] via an augmented Lagrangian approach. The specific numerical algorithm used 
is known as the alternative direction method of multipliers (ADMM), which has 
applications in statistical learning and distributed optimization. 


3 Formulation 


3.1 The Martingale Problem 


Let (2, F, Q) be a probability space, where Q is the risk-neutral measure. Suppose 
the dynamic of an asset price X; on t € [0, 1] is given by the local volatility model 


dX; =oa(t, X;)dW;, te [0,1], (7) 


where o(t, x) is a local volatility function and W, is a one-dimensional Brownian 
motion. For the sake of simplicity, suppose the interest and dividend rates are zero. 
Denote by p(t, x) the density function of X; and y(t, x) = o(t, ay 2 the diffusion 
coefficient. It is well known that p(t, x) follows the Fokker-Planck equation 


dp (t, xX) — Ixx(p(t, x)y(t, x)) = 0. (8) 


Suppose that the initial and the final densities are given by po(x) and p,(x), 
which are recovered from European option prices via the Breeden-Litzenberger [4] 
formula, 


a°C(T, K) 


K)= 
pr(K) a K2 


Let F : R — RU {+00} be a convex cost function. We are interested in 
minimizing the quantity 


1 1 
Y (/ F (y(t, X»)4r) =| / p(t, x)F (y(t, X1)) dtdx, 
0 DJo 


where F(x) = +00 if x < 0, and D C R is the support of {X;, t € [0, 1]}. Unlike 
the classical optimal transport problem, the existence of a solution here requires an 
additional condition: there exists a martingale transport plan if and only if oo and 
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satisfy: 


[ecomoenar s [ ocrpicsyar, 
R R 


for all convex function g(x) : R — R. This is known as Strassen’s Theorem [12]. 
This condition is naturally satisfied by financial models in which the asset price 
follows a martingale diffusion process. 


Remark I The formulation here is actually quite general and it can be easily 
adapted to a large family of models. For example, the case of a geometric Brownian 
motion with local volatility can be recovered by substituting a(t, X;)X; = o(t, X;) 
everywhere, including in the Fokker-Planck equation. The cost function F would 
then also be dependent on x. The later arguments involving convex conjugates still 
hold since F remains a convex function of o. 


Since pF(y) is not convex in (e, y) (which is crucial for our method), the 


substitution m(t,x) := p(t,x)y(t,x) is applied. So we obtain the following 
martingale optimal transport problem: 
; , m(t, x) 
inf p(t, x)F dtdx, (9) 
p.m Sp JO p(t, x) 


subject to the constraints: 


P(O,x) = po(x), pU,x) = pi), (10) 
Or P(t, X) — Oxxm(t, x) = 0. (11) 


Using the convexity of F, the term pF (m/p) can be easily verified to be convex in 
(e,m). Also note that we have the natural restrictions of p > 0 andm > 0. Note 
that m > 0 is enforced by penalizing the cost function F’, and p > 0 will be encoded 
in the convex conjugate formulation. (see Proposition 1) 

Next, introduce a time-space dependent Lagrange multiplier (t,x) for the 
constraints (10) and (11). Hence the associated Lagrangian is 


1 
L1¢,p.m) = ff pcs) F (SED) +604, 29(@00) — desln(t, ap) 
RJO p(t, x) 
(12) 


Integrating (12) by parts and letting m = py vanish on the boundaries of D, the 
martingale optimal transport problem can be reformulated as the following saddle 
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point problem: 


1 
inf sup L(¢, p,m) = inf sup | i (oF (*) — prod — miecd) dtdx 
p,m ¢ p,m p DJO p 
- [0.2 — PC, x)p1) dx. (13) 


As shown by Theorem 3.6 in [13], (13) has an equivalent dual formulation which 
leads to the following representation: 


1 
sup inf L(¢, p,m) = supint | / —p (p + F* (xx@)) dtdx 
o pm ¢ ° JoJo 


-[ (P(0, x) 0 — C1, x) p1) dx. (14) 
D 
In particular, the optimal ¢ must satisfy the condition 

ao + F* (Axx) = 0, (15) 


where F™* is the convex conjugate of F (see (16) and Proposition 1). We will later 
use (15) to check the optimality of our algorithm. 


3.2 Augmented Lagrangian Approach 


Similar to [1], we solve the martingale optimal transport problem using the 
augmented Lagrangian approach. Let us begin by briefly recalling the well-known 
definition and properties of the convex conjugate. For more details, the readers are 
referred to Section 12 of Rockafellar [11]. 

Fix D C R®%, let f : R4 + RU {+00} be a proper convex and lower semi- 
continuous function. Then the convex conjugate of f is the function f* : R¢ > 
R U {+00} defined by 


F*(y) = sup (x+y — f(x)). (16) 


xeR¢ 


The convex conjugate is also often known as the Legendre-Fenchel transform. 
Proposition 1 We have the following properties: 


(i) f* is a proper convex and lower semi-continuous function with f** = f; 


(ii) if f is differentiable, then f(x) + f*(f’(x)) = xf’(x). 
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Returning to the problem at hand, recall that G(x, y) :-= xF(y/x),x > 0 is 
convex in (x, y). By adopting the convention of G(x, y) = © whenever x < 0, 
it can be expressed in terms of the convex conjugate, as shown in the following 
proposition. 


Proposition 2 Denote by F* the convex conjugate of F. 


(i) Let G(x, y) = xF(y/x), the convex conjugate of G is given by: 


j F*(b) < 
Sepa ere Ost (17) 
oo, otherwise. 
(ii) For x > 0, We have the following equality, 
y . 7 
xF (=) = sup {ax+by:a+ F*(b) <0}. (18) 
x (a,b)ER?2 
Proof 
(i) By definition, the convex conjugate of G is given by 
& ae 
G*(a,b) = sup ax + by —xF(~):x>0] (19) 
(x,y)€R2 X, 
= sup fax +x (o= —F (=)) :x> o| (20) 
(x, y)ER2 x x 
= sup {x(a + F*(b))}, (21) 
x>0 


If a+ F*(b) < 0, the supremum is achieved by limit x — 0, otherwise, G* 
becomes unbounded as x increases. This establishes part (i). 
(ii) The required equality follows immediately from part (i) and the fact that 


xF (=) = sup {ax +by —G*(a,b):4a+ F*(b) < 0}. 
x (a,b) ER2 


Oo 


Now we are in a position to present the augmented Lagrangian. First, let us 
introduce the following notations: 


K =(@,b):RxR>RxR 


a+ Pb) < ol, (22) 


1 
w=(p,m)=(p,py), g=(a,b), wad=f f Le , (23) 
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H(q) = G*(a,b) = °. nies (24) 
oo, otherwise, 

Jo) = [20.90 — Cd, x)pr], (25) 

Vioxx = (Or, Oxx). (26) 


By using the above notations, we can express the equality from Proposition 2 (11) in 
the following way, 


m 
pF (*) = sup {ap+bm)} = sup{u-q}. (27) 
p {a,b}eK qeK 


Since the restriction g € K is checked point-wise for every (t, x), we can exchange 
the supremum with the integrals in the following equality 


1 1 
[ff swpteay=supf-@+f [wa] =sup[-r@+iea}. 08) 
DJO qeK q DJO0 q 


Therefore, the saddle point problem specified by (13) can be rewritten as 
sup inf H(q) + J(@) + (He, Viexd — 9). (29) 
iL ’ 


Note that in the new saddle point problem (29), jx is the Lagrange multiplier of the 
new constraint V;,,.@ = q. In order to turn this into a convex problem, we define 
the augmented Lagrangian as follows: 


r 


L+(¢,4, MW) = H(q)t+J(¢)+(n, VixxP—4) +5 


(VixxP—, VixxP—); (30) 
where r > 0 is a penalization parameter. The saddle point problem then becomes 


sup inf L,(¢, q, 4), (31) 
pw og 


which has the same solution as (13). 


4 Numerical Method 


In this section, we describe in detail the alternative direction method of multipliers 
(ADMM) algorithm for solving the saddle point problem given by (30) and (31). In 
each iteration, using (p"~!, g”~!, w”—!) as a starting point, the ADMM algorithm 
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performs the following three steps: 


Step A: o” =argminL,(¢,q""!, uw"), (32) 
¢ 

StepB: gq” =argminL,(¢",q, n”'), (33) 
q 

StepC: yw” = argmaxL,(¢",q”, ). (34) 
uw 


Step A: ¢” = arg ming L,(¢, ge ge) 


To find the function ¢” that minimizes L;(@¢, grt, pe), we set the functional 
derivative of L, with respect to @ to zero: 


TD) + (WN Vice) +1 (VixxO" — 9g"! Vicxxd) = 0. (35) 
By integrating by parts, we arrive at the following variational equation 
— 1 (O16" — BxxxxP") = (0! — ra") — Aee(m™! —rb"™"), (36) 
with Neumann boundary conditions in time Vx € D: 


ra," (0, x) = po — p” '(0, x) +ra"~!(0, x), (37) 
rao"(1,x) = p1 — p™ 11, x) +ra”1(1, x). (38) 


For the boundary conditions in space, let D = [D, D]. We give the following 
boundary condition to the diffusion coefficient: 


v(t, D) = y(t, D) = 7 = arg min F(y). 
yeR 


From (13) and (15), we know 0,,@ is the dual variable of y. Since Y minimizes F, 
the corresponding 0,,@ must be zero. Therefore, we have the following boundary 
conditions: 


dxxP(t, D) = Oyx@(t, D) =0, Vet € [0,1]. (39) 


In [1], periodic boundary conditions were used in the spatial dimension and a 
perturbed equation was used to yield a unique solution. Since periodic boundary 
conditions are inappropriate for martingale diffusion and we are dealing with a bi- 
Laplacian term in space, we impose the following additional boundary conditions 
in order to enforce a unique solution: 


o(t, D) = o(t,D)=0, Vt e€[0, 1]. (40) 
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Now, the 4th order linear PDE (36) can be numerically solved by the finite difference 
method or the finite element method. 


Step B: gq" = arg min, L;(¢",q, ee) 


Since H(q) is not differentiable, we cannot differentiate L, with respect to q. 
Nevertheless, we can simply obtain qg” by solving the minimization problem 


inf L,(¢",q, uw"). (41) 
q 


This is equivalent to solving 


ha pe 
inf (Vx + =¢; Vixx” + = i). (42) 
qeK r r 
Now, let us define 
n—-1 t,x 
p(t, x) == {a (t, x), B'(t, x)} = Vixx0" (ft, x) + —_ (43) 


then we can find g”(t, x) = {a"(t, x), b"(t, x)} by solving 


inf kag x) —a"(t, x)? + (b(t, x) — B'(t, x))* 1a + F*(b) < o| (44) 
{a,b}eRxR 


point-wise in space and time. This is a simple one-dimensional projection problem. 
If {a”, B”} satisfies the constraint a” + F*(B") < 0, then it is also the minimum. 
Otherwise, the minimum must occur on the boundary a + F*(b) = 0. In this case 
we substitute the condition into (44) to obtain 


inf | (F*(bC, x) +a, x)? + bE, x) — BO. x)", (45) 
€ 

which must be solved point-wise. The minimum of (45) can be found using standard 
root finding methods such as Newton’s method. In some simple cases it is even 
possible to compute the solution analytically. 

StepC: x” = arg max, L;(¢", 4", 1) 

Begin by computing the gradient by differentiating the augmented Lagrangian L, 
respect to wz. Then, simply update jz by moving it point-wise along the gradient as 
follows, 


w(t, x) = wh", x) + r(VixxO"(t, x) — Q(t, x). (46) 
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Stopping criteria: 
Recall the HJB equation (15): 


nb + F* (Axx) = 0. (47) 
We use (47) to check for optimality. Define the residual: 


= F* (dxx@)| . 4 
res eek ep? |b + (A,+)| (48) 


This quantity converges to 0 when it approaches the optimal solution of the problem. 
The residual is weighted by the density p to alleviate any potential issues caused by 
small values of p. 


5 Numerical Results 


The algorithm was implemented and tested on the following simple example. 
Consider the computational domain x ¢€ [0,1] and the time interval t € [0, 1]. 
We set the initial and final distributions to be Xy) ~ N(0.5,0.052) and X; ~ 
N(0.5, 0.12) respectively, where N(u, a”) denotes the normal distribution. The 
following cost function was chosen: 


_|v-7y, v20, 
-++00, otherwise, 


F(y) (49) 


where 7 was set to 0.00375 so that the optimal value of variance is constant 0? = 
0.17 — 0.057 = 0.0075. Then we discretized the space-time domain as a 128 x 128 
lattice. The penalization parameter is set to r = 64. The results after 3000 iterations 
are shown in Figs. | and 2, and the convergence of the residuals is shown in Fig. 3. 
The convergence speed decays quickly, but we reach a good approximation after 
about 500 iterations. The noisy tails in Fig. 2 correspond to regions where the density 
p is close to zero. The diffusion process has a very low probability of reaching these 
regions, so the value of o7 has little impact. In areas where p is not close to zero, 


o? remains constant which matches the analytical solution. 


6 Summary 


This paper focuses on a new approach for the calibration of local volatility models. 
Given the distributions of the asset price at two fixed dates, the technique of 
optimal transport is applied to interpolate the distributions and recover the local 
volatility function, while maintaining the martingale property of the underlying 
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Fig. 1 The density function p(t, x) 


process. Inspired by Benamou and Brenier [1], the problem is first converted into 
a saddle point problem, and then solved numerically by an augmented Lagrangian 
approach and the alternative direction method of multipliers (ADMM) algorithm. 
The algorithm performs well on a simple case in which the numerical solution 
matches its analytical counterpart. The main drawback of this method is due to 
the slow convergence rate of the ADMM algorithm. We observed that a higher 
penalization parameter may lead to faster convergence. Further research is required 
to conduct more numerical experiment, improve the efficiency of the algorithm and 
apply it to more complex cases. 
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Fig. 2 The variance o*(t, x) 
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Fig. 3. The residual res” 
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Likelihood Informed Dimension ® 
Reduction for Remote Sensing od 
of Atmospheric Constituent Profiles 


Otto Lamminpaa, Marko Laine, Simo Tukiainen, and Johanna Tamminen 


Abstract We use likelihood informed dimension reduction (LIS) (Cui et al. 
Inverse Prob 30(11):114015, 28, 2014) for inverting vertical profile information 
of atmospheric methane from ground based Fourier transform infrared (FTIR) 
measurements at Sodankyla, Northern Finland. The measurements belong to the 
word wide TCCON network for greenhouse gas measurements and, in addition to 
providing accurate greenhouse gas measurements, they are important for validating 
satellite observations. 

LIS allows construction of an efficient Markov chain Monte Carlo sampling 
algorithm that explores only a reduced dimensional space but still produces a good 
approximation of the original full dimensional Bayesian posterior distribution. This 
in effect makes the statistical estimation problem independent of the discretization 
of the inverse problem. In addition, we compare LIS to a dimension reduction 
method based on prior covariance matrix truncation used earlier (Tukiainen et al., J 
Geophys Res Atmos 121:10312—10327, 2016). 


1 Introduction 


Atmospheric composition measurements have an increasingly crucial role in mon- 
itoring the green house gas concentrations in order to understand and predict 
changes in climate. The warming effect of greenhouse gases, such as carbon 
dioxide (CO) and methane (CH4), is based on the absorption of electromagnetic 
radiation originating from the sun by these trace gases. This mechanism has a strong 
theoretical base and has been confirmed by recent observations [4]. 

Remote sensing measurements of atmospheric composition, and greenhouse 
gases in particular, are carried out by ground-based Fourier transform infrared 
(FTIR) spectrometers, and more recently by a growing number of satellites (for 
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example SCIAMACHY, ACE-FTS, GOSAT, OCO-2). The advantage of satellite 
measurements is that they provide global coverage. They are used for anthropogenic 
emission monitoring, detecting trends in atmospheric composition and studying 
the effects of biosphere, to name but a few examples. Accurate ground-based 
measurements are crucial to satellite measurement validation, and the global 
Total Carbon Column Observing Network (TCCON [17]) of FTIR spectrometers, 
consisting of around 20 measurement sites around the world, is widely used as a 
reference [3]. The FTIR instrument looks directly at sun, returning an absorption 
spectrum as measured data. 

Determining atmospheric gas density profiles, or retrieval, from the absorption 
spectra is an ill-defined inverse problem as the measurement contains only a limited 
amount of information about the state of the atmosphere. Based on prior knowledge 
and using the Bayesian approach to regularize the problem, the profile retrieval 
is possible, provided that our prior accurately describes the possible states that 
may occur in the atmosphere. When retrieving a vertical atmospheric profile, the 
dimension of the estimation problem depends on the discretization. For accurate 
retrievals a high number of layers are needed, leading to a computationally costly 
algorithms. However, fast methods are required for the operational algorithm. For 
this purpose, different ways of reducing the dimension of the problem have been 
developed. The official operational TCCON GGG algorithm [17] solves the inverse 
problem by scaling the prior profile based on the measured data. This method is 
robust and computationally efficient, but only retrieves one piece of information 
and thus can give largely inaccurate results about the density profiles. 

An improved dimension reduction method for the FTIR retrieval based on 
reducing the rank of the prior covariance matrix was used by Tukiainen et al. 
[16] using computational methods developed by Solonen et al. [13]. This method 
confines the solution to a subspace spanned by the non-negligible eigenvectors of the 
prior covariance matrix. This approach allows a retrieval using more basis functions 
than the operational method and thus gives more accurate solutions. However, the 
prior has to be hand tuned to have a number of non-zero singular values that 
correspond to the number of degrees of freedom for the signal in the measurement. 
Moreover, whatever information lies in the complement of this subspace remains 
unused. 

In this work, we introduce an analysis method for determining the number of 
components the measurement can provide information from [12], as well as the 
likelihood informed subspace dimension reduction method for non-linear statistical 
inverse problems [2, 14]. We show that these two formulations are in fact equal. 
We then proceed to implement a dimension reduction scheme for the FTIR inverse 
problem using adaptive MCMC sampling [5, 6] to fully characterize the non- 
linear posterior distribution, and show that this method gives an optimal result with 
respect to Hellinger distance to the non-approximated full dimensional posterior 
distribution. In contrast with the previously implemented prior reduction method, 
the likelihood informed subspace method is also shown to give the user freedom to 
use a prior derived directly from an ensemble of previously conducted atmospheric 
composition measurements. 
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We consider the atmospheric composition measurement carried out at the FMI Arc- 
tic Research Centre, Sodankyla, Finland [9]. The on-site Fourier transform infrared 
spectrometer (FTIR) measures solar light arriving to the device directly from the 
sun, or more precisely, the absorption of solar light at different wavelengths within 
the atmosphere. From the absorption spectra of different trace gases (CO2, CH4, 
...) We can compute the corresponding vertical density profiles, i.e. the fraction of 
the trace gas in question as a function of height. 

Let us consider the absorption spectrum with m separate wavelengths. The solar 
light passing through the atmosphere and hitting the detector can be modeled 
using the Beer-Lambert law, which gives, for wavelengths A;, j € [1,...,m], the 
intensity of detected light as 


K oe} 
1(Aj) = Io(Aj) exp (- ay Clhj, Sto) (@ii+baj+e)+d, (1) 
k=1 


where Jp is the intensity of solar light when it enters the atmosphere, the atmosphere 
has K absorbing trace gases, 6; (Aj, z) is the absorption coefficient of gas k, which 
depends on height z and on the wavelength A;, and p;(z) is the density of gas k at 
height z. The second degree polynomial and the constant d in (1) are used to describe 
instrument related features and the continuity properties of the spectrum. In reality, 
solar light is scattered on the way by atmospheric particles. This phenomenon is 
relatively weak in the wavelength band we are considering in this work, so it will be 
ignored for simplicity. 

The absorption in continuous atmosphere is modeled by discretizing the integral 
in Eq. (1) into a sum over atmospheric layers and assuming a constant absorption 
for each separate layer. This way, a discrete computational forward model can 
be constructed, giving an absorption spectrum as data produced by applying the 
forward model to a state vector x describing the discretized atmospheric density 
profile for a certain trace gas. In this work, we limit ourselves to consider the 
retrieval of atmospheric methane (CHa). 


2.1 Bayesian Formulation of the Inverse Problem 


Consider an inverse problem of estimating unknown parameter vector x € R” from 
observation y € R”, 


y=F(x) +e, (2) 


where our physical model is describe by the forward model F : R" — R’” and the 
random variable ¢ € R” represents the observation error arising from instrument 
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noise and forward model approximations. In the Bayesian approach to inverse 
problems [7] our uncertainty about x is described by statistical distributions. The 
solution to the problem is obtained as posterior distribution of x conditioned on 
a realization of the data y and depending on our prior knowledge. By the Bayes’ 
formula, we have 


m(x|y) & W(y|x)Tpr(X), (3) 


where (x|y) is the posterior distribution, z(y|x) the likelihood and zp,(x) the 
prior distribution. The proportionality « comes from a constant that does not depend 
on the unknown x. In this work, we assume the prior to be Gaussian, .V (x9, 17 pr)s 


e.g. 
1 
Tpr(x) O exp (-5« te) 2a Ge x) (4) 


Also, the additive noise is assumed to be zero-mean Gaussian with known covari- 
ance matrix, ¢ ~ (0, X’op;), so the likelihood will have form 


1 
m(y|x) x exp (-50 — F(x) TU - Fis) (5) 


When the forward model is non-linear, the posterior distribution can be explored 
by Markov chain Monte Carlo (MCMC) sampling. When the dimension of the 
unknown is hight, for example by discretization of the inverse problem, MCMC 
is known to be inefficient. In this paper, we utilize dimension reduction to be able to 
make MCMC more efficient in high dimensional and high CPU problems. 


2.2 Prior Reduction 


The operational GGG algorithm for the FTIR retrieval problem [17] is effectively 
one dimensional as it only scales the prior mean profile. However, there are about 
three degrees of freedom in the FTIR signal for the vertical profile information. To 
construct basis functions that could utilize this information a method that uses prior 
reduction was developed in [16]. It is based on the singular value decomposition on 
the prior covariance matrix, 


m 


Se UAC SY ain (6) 
i=1 


which allows further decomposition as 


Xpr = PP', with P= (Vii +. + Jamin) (7) 
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If the prior can be chosen so that most of the singular values are negligible, then the 
rank of the prior covariance matrix can be reduced by considering only the first r 
singular values and vectors: 


Epr = PPI, with P, = (Aiur +--+ VFrur). (8) 


The unknown x has an approximate representation by r basis vectors from the 
columns of P, and using a reduced dimensional parameter a € R’ as 


x &xot+ Pra. (9) 


By the construction, the random vector @ has a simple Gaussian prior, a ~ .V (0, D, 
which allow us to write the approximate posterior as 


& I Tl T 
m(xly) & H (ely) o exp (-; ((y = F(t + Pro)” Ey (9 — Fo + Pray) +a »)) 
(10) 


Now, instead running MCMC in the full space defined by x, we can sample the 
low dimensional parameter a and retain the approximation of the full posterior by 
Eq. (9). 


2.3. Likelihood-Informed Subspace 


The prior reduction approach depends on the ability to construct a realistic prior 
that can be described by only a few principle components. For the FTIR retrieval 
problem this is possible to some extent [16]. However, there are several possible 
caveats. We have to manually manipulate the prior covariance matrix to have a lower 
rank, which can lead to information loss as the solution will be limited to a subspace 
defined by the reduced prior only. 

In atmospheric remote sensing the information content of the measurement is an 
important concept to be considered when designing the instruments and constructing 
the retrieval methodology, we refer to book by Rodgers [12]. 

Consider a linearized version of the inverse problem in Eq. (2), 


y=J(x—x0) +, (11) 


with Gaussian prior and noise. The forward model is assumed to be differentiable, 
and J denotes the Jacobian matrix of the forward model with elements Jj; = fe F;. 

. ba 
Using Cholesky factorizations for the known prior and error covariances, 


2 pr _ Tie Lobs = Lobs a (12) 
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we can perform pre-whitening of the problem by setting 


Vee, Jae Jan Fe, G=spade=2 ie 863) 


obs obs obs 


Now the problem can be written as 
¥ = IX+E, (14) 


with € ~ .V (0, I) and a priori ¥ ~ .V (0, I). 

As the unknown x and the error ¢ are assumed to be independent, the same holds 
for the scaled versions. We can compare the prior variability of the observation 
depending on x and that coming from the noise ¢ by 


= ~oT 


S, = Evy] = E[(V% +2) (JF +8)"] = II +1. (15) 


The variability in y that depends only on the parameter x depends itself on iF 
and it can be compared to the unit matrix I that has the contribution from the scaled 
noise. The directions in JJ? which are larger than unity are those dominated by 
the signal. Formally this can be seen by diagonalizing the scaled problem by the 
singular value decomposition, 


J=WAV’', (16) 
and setting 
Yaw yaw IX+W F= AV F472 = AX’ +28. (17) 


The transformations e’ and x’ conserve the unit covariance matrix. In other words, 
y’ is distributed with covariance A? + I. This is a diagonal matrix, and the elements 
of vector y’ that are not masked by the measurement error are those corresponding 
to the singular values A; > 1 of the pre-whitened Jacobian J. Furthermore, degrees 
of freedom for signal and noise are invariant under linear transformations [12], so 
the same result is also valid for the original y. 

Another way to compare the information content of the measurement relative to 
the prior was used in [2]. This is to use the Rayleigh quotient 


a’ £). HL pra 
R( Lyra) = A, 


ava 


(18) 


where a € R” and H = J ome) is the Gauss-Newton approximation of Hessian 
matrix of the data misfit function 


1 
n(x) = 5 (0 — F@))" 25,0 - F@))). (19) 
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Directions for which &(Z,,a) > 1 are the ones in which the likelihood contains 
information relative to the prior. This follows from the fact that the ith eigenvector 
v; of the prior-preconditioned Gauss-Newton Hessian 


Hawg, (20) 


maximizes the Rayleigh quotient over a subspace R” \ span {v1, ..., vj—1} and the 
r directions v; for which BL rv) > | correspond to the first r eigenvalues of H. 
We call these vectors the informative directions of the measurement. 

To see the correspondence for the two approaches for the informative directions 
we notice that for H (x) it holds that 


2g 2G) 2,J02. 


= (7, Ij) 4, Ia (21) 
= J (x)F(x). 


The eigenvalues A* of matrix H (x) less than unity correspond to the singular values 
X less than unity of the scaled Jacobian J (x). The corresponding eigenvectors are 
the same as the right singular vectors v of J. The informative and non-informative 
directions for a simple 2-dimensional Gaussian case are illustrated in Fig. 1. 


Informative directions 


Prior 


Likelihood 


Posterior 


1 Prior Mean 


© Posterior Mean |_| 


Fig. 1 Illustration of an informative direction x, and a non-informative direction x, using a 
2-dimensional Gaussian case. Here, the likelihood has only one informative component, so the 
remaining direction for the posterior is obtained from the prior 


72 O. Lamminpaa et al. 


Next, we use the informative directions of the measurement to reduce the 
dimension of the inverse problem. Consider approximations for the posterior of the 
form 


H(x|y) « W(y|TT-x)Tpr(X), (22) 


where JT, is rank r projection matrix. In [2] and [14] it was shown that for any given 
r, there exists a unique optimal projection /7,. that minimizes the Hellinger distance 
between the approximative rank r posterior and the full posterior. Furthermore, 
using the connection to Rodgers’ formalism, the optimal projection can be obtained 
explicitly with the following definition. 


Definition 1 (LIS) Let V, ¢ R"*” be a matrix containing the first r left singular 
vectors of the scaled Jacobian J. Define 


©, := Lor V, and O, := L,," Vy. (23) 


pr 
The rank r LIS projection for the posterior approximation (22) is given by 
TI, = 07. (24) 


The range X, of projection I7, : R” — X, is a subspace of state space R” 
spanned by the column vectors of matrix ®,. We call the subspace X,. the likelihood- 
informed subspace (LIS) for the linear inverse problem, and its complement R” \ X, 
the complement subspace (CS). 


Definition 2 The matrix of singular vectors V = [V,V_,] forms a complete 
orthonormal system in R” and we can define 


@, := Ly V_ and O| := 2 Vi (25) 
and the projection I — /7, can be written as 
I-, =%,0@7, (26) 
Define the LIS-parameter x, € IR’ and the CS-parameter x; € R"~" as 
x= @Fx, x p= OTx. (27) 
The parameter x can now be naturally decomposed as 


x =/1,-x + (1 — IT-)x 
(28) 
=@,x,+ Ox. 
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Using this decomposition and properties of multivariate Gaussian distributions, we 
can write the prior as 


T pr (x) = Ty (Xp) TL (XL) (29) 

and approximate the likelihood by using the r informative directions, 
W(y|x) = (y|®-x,)a(y|P1 x1) © W(y|P,-x), (30) 

which leads us to the approximate posterior 

(xl y) = W(y|Dpxp Ay (Xr) AL (XL). (31) 
When the forward model is not linear, the Jacobian and Hessian matrices depend 
on the parameter x and the criterion (18) only holds point wise. To extend this 
local condition into a global one, we consider the expectation of the local Rayleigh 


quotient &(-Z,,v; x) over the posterior, 


vi TT Iv 


RM Spvi n=, F= [ Fem (alyyar. (32) 


The expectation is with respect to the posterior distribution, which is not available 
before the analysis. In practice, an estimate is obtained by Monte Carlo, 


a pate 
=-S F(x), 33 
‘a aoe (33) 


where x“) is a set of samples from some reference distribution which will be 
discussed later in this work. We can now use the singular value decomposition 
Jn = WAV? to find a basis for the global LIS analogously to the linear case. 

The advantage of LIS dimension reduction is that it is sufficient to use MCMC 
to sample the low-dimensional x, from the reduced posterior 2 (y|®,x,)7; (x), 
and form the full space approximation using the known analytic properties of the 
Gaussian complement prior 77 (x_). 


3 Results 


To solve the inverse problem related to the FTIR measurement [16], we use adaptive 
MCMC [6, 10] and SWIRLAB [15] toolboxes for Matlab. The results from newly 
implemented LIS-algorithm as well as from the previous prior reduction method are 
compared against a full dimensional MCMC simulation using the Hellinger distance 
of approximations to the full posterior. We use a prior derived from an ensemble of 
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Fig. 2. The prior derived from an ensemble of ACE satellite measurements. Left: Full prior profile, 
mean with dashed line and 95% probability limits in grey. Top right: covariance matrix derived 
from the measurements. Bottom right: first 20 singular values of the prior covariance matrix 


atmospheric composition measurements by the ACE satellite [1]. The vertical prior 
distribution, prior covariance and prior singular values are illustrated in Fig. 2. 

In Fig. 3, we show the results of our retrievals using full-space MCMC, compared 
with LIS dimension reduction and prior reduction using four basis vectors in each 
method. The retrievals are further compared against accurate in-situ measurements 
made using AirCore balloon soundings [8] which are available for the selected 
cases, also included in Fig.3. In this example, the Monte-Carlo estimator (33) 
for i in Eq. (33) was_computed using 1000 samples drawn from the Laplace 
approximation .V (x, Sasi), where x and ae are the posterior MAP and 
covariance, respectively, obtained using optimal estimation [12]. 

In order to compare the performance of MCMC methods, we define the sample 
speed of a MCMC run as 
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Fig. 3. Atmospheric CH, density profile retrieval results. Retrieved posterior in green, prior in 
gray, and in-situ AirCore measurement in red. The color shading indicates areas where 95% of the 
profiles are. Right: MCMC with in full space. Middle: MCMC with LIS. Right: MCMC with prior 
reduction 


Definition 3 The effective sample size Neg of a MCMC chain is given by 


Nm 


Nee = ———_. 
ee Dae ye 4 OE) 


(34) 


where Nyy is the length of the MCMC chain and px (x) is lag-k autocorrelation for 
parameter x [11]. Define the sample speed of an MCMC chain as 


ge (35) 
tM 


where fy is the total computation time of the MCMC chain. 


For the MCMC runs shown in Fig. 3, we get as corresponding sample speeds as 
samples per second: 


V(full) = 1.56s7!, V(LIS) =19.01s~!, V(PriRed) = 19.66s7!. (36) 
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In order to compare the approximate posteriors obtained from prior reduction 
and LIS-dimension reduction against the full posterior, we use the discrete Hellinger 
distance, 


H(P, Q) = 


k 
VP - Va (37) 
i=l 


sl 


where P = (pj,..., pe) and Q = (q,...,q;) are discrete representations 
of the full and approximate posterior distributions obtained from histograms of 
corresponding MCMC runs. The Hellinger distances of both approximations to the 
full posterior can be seen in Fig. 4 together with the corresponding sample speeds, 
both as a function of the number of singular vectors used. In Fig.4 have also 
visualized the first four singular vectors used in prior reduction and LIS method 
for the example retrieval in Fig. 3. 
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Fig. 4 Left: Hellinger distances to full posterior and sample speeds of corresponding MCMC runs 
as functions of singular vectors used in the approximation. Top right: first four singular vectors 
from prior reduction. Bottom right: first four singular vectors of J forming the LIS basis 
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4 Conclusions 


Although both of the discussed dimension reduction methods provide roughly the 
same computational gains in the performance of the MCMC sampler, we see from 
Fig. 4 that while using an empirical prior, the prior reduction method requires a lot 
more singular vectors to achieve the same Hellinger distance from the full posterior 
as the LIS method, which gets really close already with four singular vectors. We 
conclude that the LIS method gives an efficient MCMC sampling algorithm to solve 
the inverse problem arising from the FTIR retrieval, with an additional improvement 
of allowing the direct usage of an empirical prior. 
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Abstract Contour integrals in the complex plane are the basis of effective numer- 
ical methods for computing matrix functions, such as the matrix exponential and 
the Mittag-Leffler function. These methods provide successful ways to solve partial 
differential equations, such as convection—diffusion models. Part of the success 
of these methods comes from exploiting the freedom to choose the contour, by 
appealing to Cauchy’s theorem. However, the pseudospectra of non-normal matrices 
or operators present a challenge for these methods: if the contour is too close to 
regions where the norm of the resolvent matrix is large, then the accuracy suffers. 
Important applications that involve non-normal matrices or operators include the 
Black-Scholes equation of finance, and Fokker—Planck equations for stochastic 
models arising in biology. Consequently, it is crucial to choose the contour carefully. 
As a remedy, we discuss choosing a contour that is wider than it might otherwise 
have been for a normal matrix or operator. We also suggest a semi-analytic approach 
to adapting the contour, in the form of a parabolic bound that is derived by 
estimating the field of values. 
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To demonstrate the utility of the approaches that we advocate, we study 
three models in biology: a monomolecular reaction, a bimolecular reaction and a 
trimolecular reaction. Modelling and simulation of these reactions is done within 
the framework of Markov processes. We also consider non-Markov generalisations 
that have Mittag-Leffler waiting times instead of the usual exponential waiting times 
of a Markov process. 


1 Introduction 


We begin with the Chapman—Kolmogorov forward equation, associated with a 
Markov process on discrete states, for the evolution in continuous time of the 
probability of being in state j at time r: 


d 
Plat) = lap. + D) ajar). (1) 
iAj 


Here for j 4 i, aj,; = 0, and aj, ;dt is approximately the probability to transition 
from state j to state i in a small time dt. The diagonal entry of an associated matrix 
A = {aj;,;} is defined by the requirement that the matrix has columns that sum to 
zero, namely aj; = — arr aj,;- This equation in the Markov setting (1) can be 
generalised to a non-Markovian form: 


Qj,i 


lajil 


d t t 
qe i= =f KGU,t—wu)pG, u)du + >> [ KG, t — u)p@, u)du. (2) 


i-j 


Here the so-called memory function K (i, t — u) is defined via its Laplace transform 
K (j, 5), as the ratio of the Laplace transform of the waiting time to the Laplace 
transform of the survival time. The process is a Markov process if and only if the 
waiting times are exponential random variables, in which case the K (j, f) appearing 
in the convolutions in (2) are Dirac delta distributions and (2) reduces to the usual 
equation in (1). In the special case of Mittag-Leffler waiting times, (2) can be re- 
written as [12, 13]: 


D* p= Ap with solution P(t) = Eq(At®) p(0). (3) 
Here D? denotes the Caputo fractional derivative, and where the Mittag-Leffler 
function is 
0° k 


V4 
Eq(z) = 2 T@k+1) (4) 
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Here 0 < a < 1. In Sect. 4, we provide a MATLAB code to simulate sample 
paths of this stochastic process with Mittag-Leffler waiting times. When a = 1 
the series (4) reduces to the usual exponential function, and the fractional equation 
in (3) reduces to the original Markov process (1). In Sect. 5, we provide a MATLAB 
code to compute the solution Ey (At®) p(0) in (3) directly, via a contour integral. 

Next, we introduce the (Markovian) Fokker—Planck partial differential equation 
(sometimes also known as the forward Kolmogorov equation) 


a _ a 1 a? , 
97 Pt) = — a a), 1) + 55 OG) PG, #)) (5) 


for a probability density p(x, f) in one space dimension and time, with suitable 
boundary conditions, initial conditions, and smoothness assumptions on the coef- 
ficients a(x), and on b(x) > O. Later in the work we will use complex-variable 
methods, so it is worth noting at the outset that our coefficients a(x) and b(x) are 
always real-valued. It is also worth noting that a(x) here in (5) is not the same as 
i,j; appearing in the matrix above in (3), although there is a close relationship that 
allows one to be deduced from the other. This PDE for the density corresponds to a 
stochastic differential equation for sample paths 


dX = a(X)dt + /b(X)dw. (6) 


Both the PDE (5) and SDE (6) are continuous in the spatial variable. An introduction 
to these models, and their connections with discrete versions, can be found in [1, 5]. 
Our discrete models do respect important properties such as non-negativity. How- 
ever, there are issues with the Fokker—Planck PDE model, the Chemical Langevin 
SDE model, and other diffusion approximations: often these approximations do 
not maintain positivity. These issues are discussed by Williams in the Kolmogorov 
Lecture and accompanying paper, where a method that maintains non-negativity is 
proposed [11]. 

We do not simulate or solve either of the PDE (5) or the SDE (6). We do however 
simulate and solve the closely related models that are discrete in space and that are 
governed by master equations (1) or generalized master equations (2), which can be 
thought of as finite difference discretizations of (5). In particular, the PDE (5) can 
be thought of as a continuous-in-space analogue of the discrete process in (1) that 
involves a matrix A. The utility of the PDE is that it is easier to find an estimate of the 
field of values of the spatial differential operator on the right hand side of (5) than of 
the corresponding matrix A. We can then use an estimate of one as an approximation 
for the other. 

Developing appropriate methods for modelling and simulation of these important 
stochastic processes is a necessary first step for more advanced scientific endeavors. 
A natural next step is an inverse problem, although we do not pursue that in this 
article. For example, it is of great interest to estimate the rate constants in the models 
described in Sect. 2, typically based on limited observations of samples resembling 
the simulation displayed in Fig. 4. It is more challenging to estimate the parameter 
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a in fractional models, or indeed to address the question of whether or not a single a 
(the only case considered in this article) is appropriate. Interest in fractional models 
is growing fast as they are finding wide applications including in cardiac modelling 
[2], and this includes exciting new applications of Mittag-Leffler functions [3]. 

Section 2 introduces three models that can be cast in the form of (3) and Sect. 3 
uses these models as vignettes to exhibit manifestations of pseudospectra. Next we 
introduce two methods of simulation for these models. A Monte Carlo approach is 
presented in Sect. 4. Section 5 presents an alternative method that directly computes 
the solution of (3) as a probability vector via a contour integral. Finally we suggest 
a bound on the field of values that is useful when designing contour methods. 

To our knowledge Fig. 3 is the first visualization of the pseudospectra of the 
Schlégl reactions. The estimate in (24) is also new. In fact (24) is the specialization 
of our more general estimates appearing in (22) and (23) to the monomolecular 
model, but it should be possible to likewise adapt our more general estimates to 
other models such as bimolecular models. 


2 Three Fundamental Models 


All three models that we present are represented by tri-diagonal matrices: j ¢ {i — 
Llii+tl}> Aj, j = 0. Since all other entries are zero, below, we only specify the 
non-zero entries on the three main diagonals. In fact, an entry on the main diagonal 
is determined by the requirement that columns sum to zero (which corresponds to 
conservation of probability), so it would suffice to specify only the two non-zero 
diagonals immediately below and above the main diagonal. 


2.1 Monomolecular, Bimolecular and Trimolecular Models 


A model of a monomolecular reaction 


such as arises in chemical isomerisation [5], or in models of ion channels in cardiac- 
electrophysiology, or neuro-electrophysiology, can be represented by the following 
N x N matrix. 
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Monomolecular model matrix 


j-1, i=j-l, 
Ajj = —m, = ie (7) 
m—jt+l, i=j+l. 


Here we have assumed that the rate constants are equal to unity, c} = cz = 1, and 
we have assumed that m = N — 1 where m is the maximum number of molecules. 
More details, including exact solutions, can be found in [8]. An instance of this 
matrix when m = 5 is 


(8) 

5 

—5 
The model is two-dimensional, but the conservation law allows it to be treated 
as effectively one-dimensional, by letting x represent $,, say. Then one possible 
corresponding continuous model (5) has drift coefficient a(x) = —c,x + co(m — x) 
and diffusion coefficient b(x) = c,x + co(m — x), forO < x < m. In the discrete 
Markov case, the exact solution is binomial. When c; = cz the stationary probability 
density of the continuous Markov model is given by Gillespie [5] as a Gaussian, 
which highlights the issues associated with the continuous models such as choosing 
the domain, boundary conditions, and respecting positivity. To enforce positivity, 


we might instead pose the PDE on the restricted domain 0 < x < m. 
Next we introduce a model for the bimolecular reaction 


Si + S25 $3, 
via the following N x N matrix. 
Bimolecular model matrix 
j-1, sale 
Mp =\=jpl=@=7+)" p>], (9) 


(a= aie, i=j+l. 
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This matrix and the model are also introduced in [12], where more details can be 
found. A small example with m = 5 is 


(10) 


Here we have assumed that the rate constants are equal to unity, c} = cz = 1, 
and that the initial condition is [$,, S52, 53] = [m,m,0O], so m is the maximum 
number of molecules, and m = WN — 1. The model is three-dimensional, but 
the conservation law together with this initial condition allow it to be treated 
as effectively one-dimensional, by letting x represent $3, say. Then a possible 
corresponding continuous model (5) has drift coefficient a(x) = —cjx +c2(m— x) 
and diffusion coefficient b(x) = cyx + c2(m — x)?. 

Finally, we introduce the Schlégl model [13], which consists of two reversible 
reactions 


By +2X <—> 3X, By <-> X. 
Here B; = 1 x 10°, By = 2 x 10°. The associated matrix is given below, where 


ky =3x 1077,kg = 1 x 1074, kg = 1 x 1073, and ky = 3.5. In theory this matrix 
is infinite but we truncate to a finite section for numerical computation. 


Schlégl model matrix (an example of a trimolecular model scheme) 


_ jgeG-DU-2)U-3+4umG-), i=7-1, 


Ajj = 1 : ’ _ (1D) 
k3Bo + 5k Bi(j — 1)(y — 2), PS aol. 


For i = j, the diagonal entry is (gko(j NG -2G-3)+kaG —-D+k3B.+ 
sky By (j — )G — 2)). The first column is indexed by 7 = 1 and corresponds to a 
state with 0 = 7 — 1 molecules. The corresponding continuous model (5) has drift 
coefficient a(x) = k3By+ sky Byx(x 1) akox (x — 1)(x — 2) —k4x and diffusion 
coefficient b(x) = k3Bz + 5k Bix(x — 1) + gkox(x — W(x — 2) + kage. 
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3 Pseudospectra Are Important for Stochastic Processes 


All three matrices introduced in the previous section exhibit interesting pseudospec- 
tra. As an illustration of the way that the pseudospectra manifest themselves, we will 
now consider numerically computing eigenvalues of the three matrices. This is a first 
demonstration that one must respect the pseudospectrum when crafting a numerical 
method. That issue will be important again when we use numerical methods based 
on contour integrals to compute Mittag-Leffler matrix functions. 

The reader can readily verify that using any standard eigenvalue solver, such 
as eig in MATLAB, leads to numerically computed eigenvalues that are complex 
numbers. However, these numerically computed complex eigenvalues are wrong: 
the true eigenvalues are purely real. It is the same story for all three models. See the 
numerically computed eigenvalues displayed here in Figs. | and 2, for example. We 
suggest this effect happens much more widely for models arising in computational 
biology. 

Figures | and 2 make use of the following method to create a diagonal scaling 
matrix. Here is a MATLAB listing to create a diagonal matrix D that symmetrizes a 


-200 n L 1 1 L L L 
-350 -300 -250 -200 -150 -100 -50 0 


Fig. 1 The field of values (solid line resembling an oval-like shape) for the discrete monomolec- 
ular matrix in (7) when N = 200 (m = N — 1), as computed by Chebfun [4]. The crosses 
mark numerically computed eigenvalues via eig, but complex eigenvalues are wrong. The true 
eigenvalues are purely real and are marked on the real axis by dots (note that the dots are so 
close together that they may seem to be almost a solid line). These correct values can come by 
instead computing the eigenvalues of the symmetrized matrix, after using the diagonal scaling 
matrix created by the MATLAB listing provided here 
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HX 2H KG 
x 


Xx x XXX 


x104 


Fig. 2 Same as Fig. 1, for the bimolecular model. The field of values for the discrete bimolecular 
matrix in (9) when N = 200 is the solid line resembling an oval-like shape, as computed by 
Chebfun [4] 


tridiagonal matrix of dimension N of the form described in any of the three models 
considered here: 


OL) = 43 
one 4. = Ab givj—il 
d(i+1) = sqrt (A(i,i+1)/A(i+1,i)) +*d(i); 
end 
D = diag(d) ; Asym = DxAxinv(D) ; 


This symmetrization by a diagonal matrix in a similarity transform is known to 
physicists as a gauge transformation, and it is described by Trefethen and Embree 
[15, Section 12]. A real symmetric matrix has real eigenvalues so the eigenvalues 
of DAD™! are purely real. The matrix DAD7! and the matrix A share the same 
eigenvalues because they are similar. This is one way to confirm that the true 
eigenvalues of A are purely real. Numerical algorithms typically perform well on 
real symmetric matrices, so it is better to numerically compute the eigenvalues of 
DAD7' than to compute eigenvalues of A directly. 

The €-pseudospectra [15] can be defined as the region of the complex plane 
where the norm of the resolvent matrix is large: the set z € C such that 


1 
(I — A) "|| > = (12) 
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1 10° 1 
0.5+ ; 
of | 
-0.5 + | 
| nl L ! nl 1 1 L L 1 
4 65 4 25 2 5 1 0.5 0 


x10° 


Fig. 3. Pseudospectrum of a 2000 x 2000 finite section of the matrix in (11) representing 
the Schlégl reactions, as computed by EigTool. Contours correspond to « = 1077, 107+, 
10-®, 10-8, 107! in (12). The contour closest to the real axis corresponds to 10710 


We use the 2-norm in this article. Equivalently, this is the region of the complex 
plane for which z is an eigenvalue of (A + E) for some perturbing matrix that has a 
small norm: ||E|| < ¢. Thus the wrong complex ‘eigenvalues’ appearing as crosses 
in Figs. | and 2 offer a way to (crudely) visualise the pseudospectrum. Numerical 
visualizations of the pseudospectra of the family of monomolecular matrices defined 
by (7) can be found in [8], and the pseudospectra of the bimolecular reaction defined 
by (9) can be found in [12]. Here, as an example of a trimolecular scheme, we 
present in Fig. 3 the pseudopspectra for the Schlégl reactions (11), as computed by 
EigTool. The resolvent norm is largest near the negative real axis, as we would 
expect because that is where the true eigenvalues are located. The level curves are 
not equally spaced in the figure, becoming bunched up together, suggesting a more 
interesting structure that remains to be elucidated as the dimension of the matrix 
grows. An interesting experiment is to vary the parameters, namely the product 
k3 Bo, as is done in [14, Figure 10]. In experiments not shown here, we find that 
when k3 Bp is varied, the numerical computation of the eigenvalues becomes less 
reliable (for example, when k3 = 1.4 x 10-3 and Bz is unchanged). 

Figures | and 2 also display the numerical range of a matrix. That is also known 
as the field of values, which we denote by W, and it is defined for a matrix A, as the 
set of complex numbers that come from a quadratic form with the matrix 


W(A) = {x*Ax EC: |[x|l2 = 1}. (13) 
The field of values always contains the eigenvalues. A simple algorithm [9] for 


computing W(A) involves repeatedly ‘rotating’ the matrix and then finding the 
numerical abscissa 


1 
5 max(eig (A + A*)). (14) 


88 S. MacNamara et al. 


The €-pseudospectrum of a matrix is contained in an €-neighbourhood of the field 
of values, in a sense that can be made a precise theorem [15]. We see an example of 
this in Figs. 1 and 2. 


4 A Mittag-Leffler Stochastic Simulation Algorithm 


In this short section we provide a method for Monte Carlo simulation of the 
solutions of the stochastic processes that we describe. This Monte Carlo method 
can be considered an alternative to contour integral methods, that we describe later. 
As the Monte Carlo methods do not seem to fail in the presence of non-normality, 
they can be a useful cross-check on contour integral methods. 

Here is a MATLAB listing to simulate Monte Carlo sample paths of the 
monomolecular model of isomerization corresponding to the matrix in (7), with 
Mittag-Leffler waiting times: 


i _eshoell = 100; m=10, Cileilp C@=ilp we |[=il, it, i, =i) 5 
initial state = [m,0]’; alpha = 0.9; all = 1/alpha; 
alpi=alphaspi; sinalpi=sin(alpi); cosalpi=cos(alpi) ; 

© Ss Op s = alailiedell stacey WW = (Ol), x = lalimiliesieul siceies| 5 
Wale (i < ie_ieslime.il) 


a(1l) = elax(1); a(2) = c2*x(2); asum = sum(a); 
rl=rand; r2=rand; z=sinalpi/tan(alpixr2) -cosalpi; 
tau = -(z/asum) *(all1)*log(r1) ; 
12 = feehaloleeisybiip 5) = réalerel (ecebimsiuimi(a))) , il, Viesaesie’ )) 9 
KX = xX + v(:,j); t =t+tau; T= [Tt]; xk = [K x]; 
end 
aig (i > te _sealiaeul)) 
W(eracl) = i _ievimedle %X((g,cinicl) = Xi(e,Gimel=11)) g 
end 
ie skefUbaS (AL) p ie@alaess (ie, 2X((iL, 6), “ibalineiwauelela” ,, 2)) ¢ 
xelleloeil ((“italints )) 9 yiledoeul (“imollectilesi iz (_iL”)) 5 
title({‘Isomerisation: a monomolecular model’ ; 
Silica e=leiciles SE, [\edlisina == 7, num2str (alpha) ] }) 
This is a Markov process with exponential waiting times when a = 1, in 


which case the program reduces to a version of the Gillespie Stochastic Simulation 
Algorithm. Figure 4 shows a sample path simulated with this MATLAB listing. 
A histogram of many such Monte Carlo samples is one way to approximate the 
solution of (3). A different way to compute that solution is via a contour integral, as 
we describe in the next section, and as displayed in Fig. 5. 

A time-fractional diffusion equation in the form of (3) in two space dimensions 
is solved by such a contour integral method in [16, Figure 16.2]. As an aside, we can 
modify the code above to offer a method for Monte Carlo simulation of sample paths 
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Isomerisation: example of a mono-molecular model 
Mittag-Leffler SSA 
100 oe 


90 


D n jes} 
o o o 


molecules of S, 


a 
o 


40 


30 1 1 i 1 1 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 


Time 


Fig. 4 A sample simulation of the monomolecular reaction, with the Mittag-Leffler SSA provided 
here in the MATLAB listing. Parameters: a = 0.7, ¢ = 1, and initial condition a Dirac delta 
distribution on the state [S;, $2] = [m, 0] = [100, 0]. Compare with Fig. 5 


Probability 


0 10 20 30 40 50 60 70 80 90 =©100 


Number 


Fig. 5 A discrete probability mass function is the solution of a Mittag-Leffler master Eq. (3) for 
the monomolecular reaction, and can be computed with the MATLAB listing provided here that uses 
a hyperbolic contour. Parameters: a = 0.7, t = 1, and initial condition a Dirac delta distribution on 
the state [$1, S2] = [m, 0] = [100, 0]. The x—axis shows the number of molecules of $2. Compare 
with Fig. 4 
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of a closely related stochastic process, as follows. Simplify to one space dimension 
(it is also straight forward to simulate in two dimensions), and by supposing that the 
governing (m+ 1) x (m+ 1) matrix is 


-1 1 
1-2 1 
A= bas (15) 
1-2 1 
1-1 
with initial vector p(0) being zero everywhere except the middle entry, i.e. the 
round(m/2)th entry is one. Then (3) corresponds to a random walk on the integers 


0,1,...,m, beginning in the middle. To simulate, modify the first few lines of the 
above code segment to 


t_final = 1; m=10; v= ([-1, 1]; 
initial state = round(m/2); 


and also the first line in the while loop, to 
a(1) = (x>0); a(2) = (x«<m); 


leaving the rest unchanged. 


5 Computing a Mittag-Leffler Matrix Function 


In this section we first establish the utility of contour integral methods by directly 
computing the desired solution of (3). This is motivation for exploring bounds on 
the pseudospectrum, so that informed choices can be made when designing contour 
methods. 


5.1 Computing Contour Integrals 


Start with (3). Take the Laplace transform. Then take the inverse Laplace transform. 
Of course those two steps arrive at the same solution we started with. The advantage 
is the desired solution of (3) represented as a contour integral [12, 13, 16]: 


1 =1 
p(t) = Eq(At*) p(0) = a [ exp(zt) (al - za) p(0)dz. (16) 


The eigenvalues of A must lie to the left of the contour J”. In all our examples, 
they lie along the negative real axis. Also, we exploit symmetry when A is real. 
Apart from those requirements, we are free to choose the contour. Typical choices 
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Fig. 6 A parabolic contour (solid) as described in [16], and a hyperbolic contour (dashed, with 
M = 16) used in the MATLAB listing provided here. Nodes for a quadrature rule as in (17) are also 
marked on the contours 


are displayed in Fig. 6. The exp(zt) factor is nearly zero when the real part of z 
is sufficiently negative. This motivates choosing I” to go into the left-half plane 
because then we can neglect the infinite part of the contour that lies to the left of 
say, —30, and still obtain high accuracy. To evaluate the integral on the remaining 
finite part of the contour, we use quadrature. The trapezoidal rule is a good choice. 
For a quadrature rule with M nodes z, on the contour, we approximate (16) by 


M 
P(t) © > we(t)ur (17) 
k=1 


where we solve a linear system for the vector uz at each node k 
(al _ za) ux = p(O). 


The values wz, (t) (and note that we allow these numbers to depend on f) incorporate 
both the usual weights coming from the quadrature rule on a line, and also any 
scalings coming from parameterizing the contour. There are variations of this 
quadrature scheme. When a = 1, this procedure simplifies to compute the familiar 
matrix exponential solutions of the usual Markov systems. 

Here is a MATLAB listing, adapted from Le et al. [10], to compute the Mittag- 
Leffler solution by applying the quadrature recipe (17) on a hyperbolic contour. 
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function p = hyperbola MittagLeffler(t,A,v,alpha,M) 
alphal = 1-alpha; n = length(v); Dxi = 1.08179214/M; 
xi =[-M:M]*Dxi; delta=1.17210423; mu =4.49207528«M/t; 
Zz = mu * ( 1 - sin(complex(delta,-xi)) ); 


dz = 1i « mu * cos(complex(delta,-xi)) ; 
c = Dxi *« dz .* exp(zxt) / (2*pixli); 
I = speye(n); p = zeros(size(v)); 
Eom jose eM 
i) = je) = c(k) * ((z(k) *I-z(k) * (alphal) *A) \v) ; 
end 
p = 2*real(p); k = M+1; 
p=pt real (c(k) « ((z(k) *«I-z (k) * (alphal) *A) \v) ) ; 


We choose M = 16 quadrature points. An example of the hyperbolic contour 
being used here is displayed in Fig. 6. Figure 5 shows a solution of (3) computed 
with this MATLAB listing. 


5.2 Estimating the Field of Values 


One strategy to choose the contour, I’, is to first compute a psedospectrum of 
the matrix with EigTool [17] (as in Fig. 3 for example). Then choose I” so 
that ||(zI — A)~!|| is never too large for z € I”. For example, we might choose 
the contour so that the bound ||(zI — A)~!|| < 10° holds (which corresponds to 
€ = 107? in (12)). Arguably, the value 10? could be replaced by something smaller, 
say O(1). The particular value would depend on the application. In numerical 
experiments with these models, the issue seems to matter only when ||(zI — A)~! |] 
is significantly larger than, say, 100. Such an unfavourable situation can certainly 
arise. It is demonstrated to happen on the parabolic contour displayed here in 
Fig. 6 for bimolecular reactions [12]. The issue is also addressed by In’T Hout and 
Weideman [7] for Black-Scholes models. Figure 6 compares a parabolic contour 
used for a bimolecular model [12] with a hyperbolic contour used here. A similar 
comparison of these contours and discussion can be found in a survey article by 
Trefethen and Weideman [16, Figure 15.2]. 

However, a drawback of this procedure is that it is expensive to first compute the 
pseudospectrum. It would therefore be preferable to instead find an estimate of the 
region where the resolvent is large by some more efficient means. We find one such 
estimate next. 

Trefethen and Embree [15] discuss various ways to bound the pseudospectrum. 
One approach uses the fact that if z is not inside the field of values of a matrix, then 
the norm of the resolvent is bounded by the distance to the field of values: 


— al ee 
Gl Ab < Sway 
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This result suggests an idea for adapting the contour: choose the contour to be 
outside of the field of values. 

We now estimate the field of values of the spatial operator in the Fokker—Planck 
PDE (5); a similar approach has been applied to the Black-Scholes equation of 
finance [7, Theorem 3.1]. We focus on the particular example of the monomolecular 
model, but we believe this approach will be extended to the other models in future 
work. 

Begin by writing the Fokker—Planck equation in the form of a conservation law, 


uy +Au=O0 for0<x <mandt>0O, 

where, using a dash for 0/0x, 

Au = —4(bu)"+(au)' = —(zbu'+Bu) and B(x) = —a(x)+3b'(x). (18) 
For the monomolecular model, the coefficients are 

b(x) =cyx+c2.(m—x) and a(x) = —cyx +c2(m — x). 
Here, c}, cz and m are positive constants. Note that A is uniformly elliptic because 
b(x) > min(c,, mc2) for0 < x < m. 
We impose either homogeneous Dirichlet boundary conditions, 
u(0) =0=u(m), (19) 


or else zero-flux boundary conditions, 
I / 
rida + Bu=0O for x € {0,m}. (20) 


The domain of A is then the complex vector space D(A) of C? functions v : 
[0,m] — C satisfying the chosen boundary conditions. 
Denote the numerical range (field of values) of A by 


W(A) = { (Au, u) :u € D(A) with (u,u) = 1}, (21) 


where (u,v) = i uv. Compare this definition of W(A) in the continuous setting 
with the definition (13) in the discrete setting. For either (19) or (20), integration by 


parts gives 
1] m m 
(Au, u) = >| puis f Bui. 
2 Jo 0 
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We write 
m m _ 
X+iY = (Au,u)=4P—Q where p= blu'|? and o--f Buu’, 
0 0 
and assume that 
B(x)? B'(x) 


<K d < 
ba) = and 0 < fo < 5 


<p, forO<x <m. 


In the case of zero-flux boundary conditions (20), we require the additional 
assumption 


BO) <0 < Bim). 
Then we claim that for Dirichlet boundary conditions (19), 
Y? < 2K(X + Bi) — Bo. (22) 
whereas for zero-flux boundary conditions (20), 
¥? < 2K(X + fi). (23) 


To derive these estimates, first observe that 
= m = m m 
2nO = 04+0= -| B(uu'+iu’) = -| Bui)! = -[awPIp +f B’\ul’. 
0 0 0 
Assume that (u, u) = 1. The bounds on B’ give 
Bo — [sBlul?]o < RO < 61 —[5Blul]p. 


and by the Cauchy—Schwarz inequality, 


m m m 2 2 m 
lor < (/ Bu ?)( | w?) =f 5 blu’? < (ma) [ blu’? < KP; 
0 0 0 Db [0,m] b 0 
thus, 
Y* = (-90)? =|)? —(MQ)* < KP —(KQ)’. 


For Dirichlet boundary conditions we have [Blul?] = 0, so Bo < RQ < B; and 
hence 


P=2X+2RQ<2X+2f6; and Y*<KP-— £3, 
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implying (22). For zero-flux boundary conditions, the extra assumptions on B 


ensure that [Blul*]> > 0,so RQ < f and hence 


P=2X+2HQ <2X+42f; and Y* < KP, 


implying (23). 
In the simple case c} = cz = c we have 


b(x)=cm and B(x) = —a(x) =c(2x — m) 


and therefore 


B(x)? Cc 4 , 
bx) = —(2x —m)* <cm and B(x)=2c for0O<x <m, 
x m 


giving K = cm and fo = fi = c. In addition, BO) = —cm < O and Bim) = 
cm > 0. Thus, for zero-flux boundary conditions, 


ides < 2cm(X +c). 


Note that the sign convention in the notation in (18) makes the operator A positive 
definite (though not symmetric), whereas our model matrices are negative definite, 
so W(A) provides an approximation for W(—A) = —W(A). We therefore have to 
flip signs in (22) and (23) to get estimates for W(A), as in the following bound. 


Parabolic estimate of the field of values for the monomolecular model 
matrix when cy = cz = c in (7): 


Y* 2 Jomc— X) (24) 


This bound is displayed in Fig. 7. As in the figure, our matrix examples typically 
have a unique zero eigenvalue, with all other eigenvalues having negative real part, 
and the numerical abscissa (14) is typically strictly positive. Having derived a bound 
in the continuous setting, we can only regard it as an approximation in the discrete 
setting. Nonetheless, in these numerical experiments it does indeed seem to offer a 
useful bound for the discrete case. 


Discussion A natural next step is to incorporate this bound into the design of the 
contour for methods such as the MATLAB listing provided in Sect. 5. That will 
be pursued elsewhere. This article has thus laid the foundations for such attractive 
future research. 
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Fig. 7 The field of values for the discrete matrix in (7) when cj = co = | and m = 100, as 
computed by Chebfun. The dashed line is a parabolic bound in (24) for the field of values of 
the corresponding continuous PDE (5). This parabolic bound is semi-analytic and requires some 
analysis of the equation ‘by hand.’ Also displayed is a hyperbolic contour, as in Fig. 6 as used in 
the MATLAB listing provided here 


It is worth commenting that if a matrix is real symmetric (unlike the examples in 
this article) then the pseudospectrum is not an issue so it would not be good to make 
the contour wider (and thus also longer), and instead previously proposed (and often 
optimized and shorter) contours such as surveyed in [16] would be good choices. 
Making the contour wider and thus longer does impact the numerical method, but 
this is unavoidable in applications where the behaviour of the pseudospectra is 
an issue, such as the examples discussed here. It is also worth commenting on 
numerical methods for computing the field of values. Here we used Chebfun, 
which in turn is based on Johnson’s algorithm [9]. It is conceivable that a different 
algorithm might be devised to estimate the field of values more cheaply for the 
purposes of contour integrals, but we do not explore such an approach here. 
However, we do observe in these numerical experiments, as displayed in the figures 
for example, that the field of values—and therefore also the estimate that we derive 
from it—seems to be too conservative. That is especially noticeable in regions where 
the real part is large and negative. It might therefore also be worth considering 
methods of more cheaply, directly, estimating the pseudospectra. We began with the 
continuous PDE so using a coarse mesh in a PDE-solver, such as a finite difference 
method, might be one way to obtain a cheap estimate of the pseudospectra. 
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6 Conclusion 


We have discussed, and illustrated by example, the significance of the pseudospectra 
for stochastic models in biology, including both Markovian and non-Markovian 
generalisations. Although we focused exclusively on the 2-norm, the pseudospectra 
of these stochastic processes are perhaps more naturally addressed in the 1-norm. 
In that regard, future work will explore ways to incorporate the 1-norm methods 
described by Higham and Tisseur [6], to adaptively choosing the contour. 

Numerical methods to compute the matrix exponential functions and matrix 
Mittag-Leffler functions via contour integrals must take the pseudospectrum of the 
matrix into account. In particular, such methods must choose contours that are wide 
enough, and that adapt to avoid regions of the complex plane where the norm of 
the resolvent matrix is too large. We have derived a simple, parabolic bound on 
the field of values of an associated Fokker—Planck PDE, which can be used as an 
approximation to the field of values of the corresponding discrete matrix model. We 
believe this bound can help inform us when making a good choice for the contour. 
Ultimately, how to devise contours in a truly adaptive fashion, so that they can 
be efficiently computed, automatically and without a priori analysis, remains an 
important open question. 


Acknowledgements The authors thank the organisers of the Computational Inverse Problems 
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Adam Spannaus, Vasileios Maroulas, David J. Keffer, and Kody J. H. Law 


Abstract Point set registration involves identifying a smooth invertible trans- 
formation between corresponding points in two point sets, one of which may 
be smaller than the other and possibly corrupted by observation noise. This 
problem is traditionally decomposed into two separate optimization problems: (1) 
assignment or correspondence, and (2) identification of the optimal transformation 
between the ordered point sets. In this work, we propose an approach solving 
both problems simultaneously. In particular, a coherent Bayesian formulation of the 
problem results in a marginal posterior distribution on the transformation, which 
is explored within a Markov chain Monte Carlo scheme. Motivated by Atomic 
Probe Tomography (APT), in the context of structure inference for high entropy 
alloys (HEA), we focus on the registration of noisy sparse observations of rigid 
transformations of a known reference configuration. Lastly, we test our method on 
synthetic data sets. 


1 Introduction 


In recent years, a new class of materials has emerged, called High Entropy Alloys 
(HEAs). The resulting HEAs possess unique mechanical properties and have shown 
marked resistance to high-temperature, corrosion, fracture and fatigue [5, 18]. HEAs 
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demonstrate a ‘cocktail’ effect [7], in which the mixing of many components 
results in properties not possessed by any single component individually. Although 
these metals hold great promise for a wide variety of applications, the greatest 
impediment in tailoring the design of HEAs to specific applications is the inability 
to accurately predict their atomic structure and chemical ordering. This prevents 
Materials Science researchers from constructing structure-property relationships 
necessary for targeted materials discovery. 

An important experimental characterization technique used to determine local 
structure of materials at the atomic level is Atomic Probe Tomography (APT) 
[8, 10]. APT provides an identification of the atom type and its position in space 
within the sample. APT has been successfully applied to the characterization of 
the HEA, AlCoCrCuFeNi [16]. Typically, APT data sets consist of 10°-107 atoms. 
Sophisticated reconstruction techniques are employed to generate the coordinates 
based upon the construction of the experimental apparatus. APT data has two main 
drawbacks: (1) up to 66% of the data is missing and (2) the recovered data is 
corrupted by noise. The challenge is to uncover the true atomic level structure 
and chemical ordering amid the noise and missing data, thus giving material 
scientists an unambiguous description of the atomic structure of these novel alloys. 
Ultimately, our goal is to infer the correct spatial alignment and chemical ordering 
of a dataset, herein referred to as a configuration, containing up to 10’ atoms. This 
configuration will be probed by individual registrations of the observed point sets in 
a neighborhood around each atom. 

In this paper we outline our approach to this unique registration problem of 
finding the correct chemical ordering and atomic structure in a noisy and sparse 
dataset. While we do not solve the problem in full generality here, we present 
a Bayesian formulation of the model and a general algorithmic approach, which 
allows us to confront the problem with a known reference, and can be readily 
generalized to the full problem of an unknown reference. 

In Sect. 2 we describe the problem and our Bayesian formulation of the statistical 
model. In Sect. 3, we describe Hamiltonian Monte Carlo, a sophisticated Markov 
chain Monte Carlo technique used to sample from multimodal densities, which we 
use in our numerical experiments in Sect. 4. Lastly, we conclude with a summary of 
the work presented here and directions for future research. 


2 Problem Statement and Statistical Model 


An alloy consists of a large configuration of atoms, henceforth “points”, which are 
rotated and translated instances of a reference collection of points, denoted X = 
(X1,..., Xn), Xi € R?¢ for 1 < i < N which is the matrix representation of 
the reference points. The tomographic observation of this configuration is missing 
some percentage of the points and is subject to noise, which is assumed additive and 
Gaussian. The sample consists of a single point and its M nearest neighbors, where 
M is of the order 10. If p € [0, 1] is the percent observed, i.e. p = 1 means all points 
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are observed and p = 0 means no points are observed, then the reference point set 
will be comprised of N = [M/p] points. We write the matrix representation of the 
noisy data point as Y = (Y,..., Yu), Yi € R¢,forl <i <M. 

The observed points have labels, but the reference points do not. We seek to 
register these noisy and sparse point sets, onto the reference point set. The ultimate 
goal is to identify the ordering of the labels of the points (types of atoms) in a 
configuration. We will find the best assignment and rigid transformation between 
the observed point set and the reference point set. Having completed the registration 
process for all observations in the configuration, we may then construct a three 
dimensional distribution of labeled points around each reference point, and the 
distribution of atomic composition is readily obtained. 

The point-set registration problem has two crucial elements. The first is the 
correspondence, or assignment of each point in the observed set to the reference 
set. The second is the identification of the optimal transformation from within an 
appropriate class of transformations. If the transformation class is taken to be the 
rigid transformations, then each of the individual problems is easily solved by 
itself, and naive methods simply alternate the solution of each individually until 
convergence. 

One of the most frequently used point set registration algorithms is the iterative 
closest point method, which alternates between identifying the optimal transfor- 
mation for a given correspondence, and then corresponding closest points [1]. If 
the transformation is rigid, then both problems are uniquely solvable. If instead we 
replace the naive closest point strategy with the assignment problem, so that any 
two observed points correspond to two different reference points, then again the 
problem can solved with a linear program [9]. However, when these two solvable 
problems are combined into one, the resulting problem is non-convex [14], and 
no longer admits a unique solution, even for the case of rigid transformations as 
considered here. The same strategy has been proposed with more general non- 
rigid transformations [3], where identification of the optimal transformation is no 
longer analytically solvable. The method in [11] minimizes an upper bound on 
their objective function, and is thus also susceptible to getting stuck in a local 
basin of attraction. We instead take a Bayesian formulation of the problem that 
will simultaneously find the transformation and correspondence between point sets. 
Most importantly, it is designed to avoid local basins of attraction and locate a global 
minimum. 

We will show how alternating between finding correspondences and minimizing 
distances can lead to an incorrect registration. Consider now the setup in Fig. 1. If we 
correspond closest points first, then all three green points would be assigned to the 
blue ‘1’. Then, identifying the single rigid transformation to minimize the distances 


Fig. 1 Setup for incorrect 
registration; alternating & © @ @ @ & 
assignment and ¢7 

minimization 3 2 if 1 2 3 
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between all three green and the blue ‘1’ would yield a local minimum, with no 
correct assignments. If we consider instead assignments, so that no two observation 
points can correspond to the same reference point, then again it is easy then to see 
two equivalent solutions with the eye. The first is a pure translation, and the second 
can be obtained for example by one of two equivalent rotations around the mid- 
point between ‘1’s, by 2 or —z. The first only gets the assignment of ‘2’ correct, 
while the second is correct. Note that in reality the reference labels are unknown, 
so both are equivalent for us. Here it is clear what the solutions are, but once the 
problem grows in scale, the answer is not always so clear. This simple illustration 
of degenerate (equal energy) multi-modality of the registration objective function 
arises from physical symmetry of the reference point-set. This will be an important 
consideration for our reference point sets, which will arise as a unit cell of a lattice, 
hence with appropriate symmetry. We will never be able to know the registration 
beyond these symmetries, but this will nonetheless not be the cause of concern, as 
symmetric solutions will be considered equivalent. The troublesome multi-modality 
arises in the presence of noisy and partially observed point sets, where there may be 
local minima with higher energy than the global minima. 

The multi-modality of the combined problem, in addition to the limited infor- 
mation in the noisy and sparse observations, motivates the need for a global 
probabilistic notion of solution for this problem. It is illustrated in the following 
subsection that the problem lends itself naturally to a flexible Bayesian formu- 
lation which circumvents the intrinsic shortcomings of deterministic optimization 
approaches for non-convex problems. Indeed at an additional computational cost, 
we obtain a distribution of solutions, rather than a point estimate, so that general 
quantities of interest are estimated and uncertainty is quantified. In case a single 
point estimate is required we define an appropriate optimal one (for example the 
global energy minimizer or probability maximizer). 


2.1 Bayesian Formulation 


We seek to compute the registration between the observation set and reference set. 
We are concerned primarily with rigid transformations of the form 


T(X; 0) = RoX +t, (1) 
where Rg € R¢*4 is a rotation and to € R@ is a translation vector. 
Write [TC(X; 9) |x; = %(X;) for 1 < i < N, 1 <k < d, and where X; is the 
ith column of X. Now let € € R?¢*™ with entries &ij ~ N(O, y*)s and assume the 
following statistical model 


Y =T(X; @)C +6, (2) 


for €, 8 and C independent. 
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The matrix of correspondences C € {0, (0 cae is such that = Ci =1lils< 
j <M, and each observation point corresponds to only one reference point. So if 
X; matches Y; then Cj; = 1, otherwise, C;; = 0. We let C be endowed with a prior, 
mo(Cij = 1) = mij for 1 <i < N and1 < j < M. Furthermore, assume a prior 
on the transformation parameter 9 given by 79(@). The posterior distribution then 
takes the form 


m(C,0| X,Y) «x LY | X,C, 6)m0(C)x0(0), (3) 


where -Y is the likelihood function associated with Eq. (1). 
For a given 6, an estimate C can be constructed a posteriori by letting Cix( Dei 


= 1 for 7 =1,..., M and zero otherwise, where 
i*(j) = argmin|¥j — 7 (Xj; 6)|". (4) 
1<i<N 


For example, 6 may be taken as the maximum a posteriori (MAP) estimator or the 
mean. We note that C can be constructed either with a closest point approach, or via 
assignment to avoid multiple registered points assigned to the same reference. 

Lastly, we assume the jth observation only depends on the jth column of the 
correspondence matrix, and so Y;, Y; are conditionally independent with respect to 
the matrix C fori # j. This does not exclude the case where multiple observation 
points are assigned to the same reference point, but as mentioned above such 
scenario should have zero probability. 

To that end, instead of considering the full joint posterior in Eq. (3) we will focus 
on the marginal of the transformation 


m(O|X,Y) «L(Y | X,0)0(0). (5) 


Let C; denote the jth column of C. Since C; is completely determined by the 
single index i at which it takes the value 1, the marginal likelihood takes the form 


N 
Y- pYj | X, 6, C)mo(C) = D> pj |X, Ci = D0(Cij = V) 
Cc i=1 


N 
=o mjp(¥j | X,0,Ciy =) 
i=1 


1 
cc myexp | —s aI) - TE P| : (6) 
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The above marginal together with the conditional independence assumption 
allows us to construct the likelihood function of the marginal posterior (5) as follows 


M 
LY |X,0)= |] pa 1X,) 
j=l 
M N l 
xT] myexw}-sai- rane). (7) 


Thus the posterior in question is 


w(O| X,Y) « L(Y | X, 0)m0(8) 


M N 1 
= rij exp {1h - r(x, 091} mo). (8) 
caper 2y 
j=li 

Consider a prior on 6 such that 79(@) « exp(—AR(@)), where A > 0. Then we 
have the following objective function 


M N 
1 
E(@)=—) log) mij exp {-s2!" —T(X;; ar| +AR(). (9) 
j=l i=l 


The minimizer, 6*, of the above, Eq.(9) is also the maximizer of a posteriori 
probability under Eq. (8). It is called the maximum a posteriori estimator. This can 
also be viewed as maximum likelihood estimation regularized by 7 R(0). 

By sampling consistently from the posterior, we may estimate quantities of 
interest, such as moments, together with quantified uncertainty. Additionally, we 
may recover other point estimators, such as local and global modes. 


3 Hamiltonian Monte Carlo 


Monte Carlo Markov chain (MCMC) methods are a natural choice for sampling 
from distributions which can be evaluated pointwise up to a normalizing con- 
stant, such as the posterior (8). Furthermore, MCMC comprises the workhorse of 
Bayesian computation, often appearing as crucial components of more sophisticated 
sampling algorithms. Formally, an MCMC simulates a distribution jz over a state 
space 92 by producing an ergodic Markov chain {w;},en that has yw as its invariant 
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distribution, 1.e. 


K 

1 

ZL aton f ewyw(dw) = Bye), (10) 
k=1 


with probability 1, for g € L'(). 
The Metropolis-Hastings method is a general MCMC method defined by choos- 
ing 09 € supp(zr) and iterating the following two steps for k > 0 


(1) Propose: 6* ~ Q(O, +). 
(2) Accept/reject: Let 0,41 = 0* with probability 
6*)Q(0*, 0 
we. VS min | aa 
(Ox) OOK, O*) 


and 04.1 = Ox otherwise. 


In general, random-walk proposals Q can result in MCMC chains which 
are slow to explore the state space and susceptible to getting stuck in local 
basins of attraction. Hamiltonian Monte Carlo (HMC) is designed to improve this 
shortcoming. HMC is a Metropolis-Hastings method [4, 13] which incorporates 
gradient information of the log density with a simulation of Hamiltonian dynamics 
to efficiently explore the state space and accept large moves of the Markov chain. 
Heuristically, the gradient yields d pieces of information, for a R¢-valued variable 
and scalar objective function, as compared with one piece of information from the 
objective function alone. Our description here of the HMC algorithm follows that 
of [2] and the necessary foundations of Hamiltonian dynamics for the method can 
be found in [17]. 

Our objective here is to sample from a specific target density 


(0) x exp(—E(@)) (11) 
over 0, where E'(@) is as defined in Eq. (9) and 7 (@) is of the form given by Eq. (8). 
First, an artificial momentum variable p ~ N(0, J"), independent of 0, is 


included into Eq. (11), for a symmetric positive definite mass matrix I’, that is 
usually a scalar multiple of the identity matrix. Define a Hamiltonian now by 


_ ee 
H(p,0) = E@)+5p'T'p 


where E(@) is the “potential energy” and 5 p' | pis the “kinetic energy”. 
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Hamilton’s equations of motion for p, 06 € R@ are, fori =1,...,d: 
d6; 0H 
dtr a Di 
dp; - 0H 
dt ~— 90; 


In practice, the algorithm creates a Markov chain on the joint position- 
momentum space R~4, by alternating between independently sampling from the 
marginal Gaussian on momentum p, and numerical integration of Hamiltonian 
dynamics along an energy contour to update the position. If the initial condition 
6 ~ a and we were able to perfectly simulate the dynamics, this would give 
samples from z because the Hamiltonian H remains constant along trajectories. 
Due to errors in numerical approximation, the value of H will vary. To ensure 
the samples are indeed drawn from the correct distribution, a Metropolis-Hastings 
accept/reject step is incorporated into the method. 

In particular, after a new momentum is sampled, suppose the chain is in the state 
(p, 9). Provided the numerical integrator is reversible, the probability of accepting 
the proposed point (p*, 0*) takes the form 


a((p, 6), (p*, 0*)) = min {1, exp {H(p, 6) — H(p*, 0*)}}. (12) 


If (p*, 6*) is rejected, the next state remains unchanged from the previous iteration. 
However, note that a fresh momentum variable is drawn each step, so only 0 
remains fixed. Indeed the momentum variables can be discarded, as they are only 
auxiliary variables. To be concrete, the algorithm requires an initial state @, a 
reversible numerical integrator, integration step-size h, and number of steps L. 
Note that reversibility of the integrator is crucial such that the proposal integration 
O((p, 9), (p*, 6*)) is symmetric and drops out of the acceptance probability in 
Eq. (12). The parameters h and L are tuning parameters, and are described in detail 
[2.13]: 
The HMC algorithm then proceeds as follows: 
for k > 0do HMC: 
Pe < & forE~ V(0,T) 
function INTEGRATOR(px, 0x, 2) return (p*, 6*) 
end function 
a <— min {1, exp{H (px, &%) — H(p*, 0*)}} 
O41 < 6* with probability a otherwise 
O41 <— OK 
end for 
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Under appropriate assumptions [13], this method will provide samples 6, ~ z, 
such that for bounded g : R’? > R 


1 K 
ZH) > f e@de as Koo. 
k=1 


4 Numerical Experiments 


To illustrate our approach, we consider numerical experiments on synthetic datasets 
in R* and R?, with varying levels of noise and percentage of observed data. We 
focus our attention to rigid transformations of the form Eq. (1). 

For all examples here, the M observation points are simulated as Y; ~ N 
(Ro X ji) +, y71a), for a rotation matrix Ry parameterized by gy, and some t 


and y. So, 8 = (g,t). To simulate the unknown correspondence between the 
reference and observation points, for each i = 1,..., M, the corresponding index 
J@) ¢€ [l,..., N] is chosen randomly and without replacement. Recall that we 


define percentage of observed points here as p = a € [0, 1]. We tested various 
percentages of observed data and noise y on the observation set, then computed the 
mean square error (MSE), given by Eq. (13), between the reference points and the 
registered observed points, 


M 
1 : T 2 
FO) = oy 2 Big Ry i — 1) — XI ; (13) 
i= 


4.1 Two Dimensional Registration 


First we consider noise-free data, i.e. y = O (however in the reconstruction 
some small y > 0 is used). The completed registration for the 2-dimensional ‘fish’ 
set is shown in Figs.2 and 3. The ‘fish’ set is a standard benchmark test case 
for registration algorithms in R* [6, 12]. Our methodology, employing the HMC 
sampler described in Sect. 3 allows for a correct registration, even in the case where 
we have only 33% of the initial data, see Fig. 3. 

As a final experiment with the ‘fish’ dataset, we took 25 iid. realizations of 
the reference, all having the same transformation, noise, and percent observed. 
Since we have formulated the solution of our registration problem as a density, 
we may compute moments, and find other quantities of interest. In this experiment 
we evaluate 0 = 8 eae 6,, where @ is our MAP estimator of @ from the HMC 
algorithm. We then evaluated the transformation under 6. The completed registration 
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Fig. 3. 33% Observed data 


is shown in Fig. 4. With a relatively small number of configurations, we are able to 
accurately reconstruct the data, despite the noisy observations. 


4.2 Synthetic APT Data 


The datasets from APT experiments are perturbed by additive noise on each of 
the points. The variance of this additive noise is not known in general, and so in 
practice it should be taken as a hyper-parameter, endowed with a hyper-prior, and 
inferred or optimized. It is known that the size of the displacement on the order 
of several A (Angstroms), so that provides a good basis for choice of hyper prior. 
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Fig. 4 Full data, y = 0.5, average of 25 registrations 


Fig. 5 Example APT data: Left: Hidden truth, Center: Noise added, Right: Missing atoms colored 
grey 


In order to simulate this uncertainty in our experiments, we incorporated additive 
noise in the form of a truncated Gaussian, to keep all the mass within several A. The 
experiments consider a range of variances in order to measure the impact of noise 
on our registration process. 

In our initial experiments with synthetic data, we have chosen percentages of 
observed data and additive noise similar to what Materials Scientist experimentalists 
have reported in their APT datasets. The percent observed of these experimental 
datasets is approximately 33%. The added noise of these APT datasets is harder 
to quantify. Empirically, we expect the noise to be Gaussian in form, truncated to 
be within 1-3 A. The standard deviation of the added noise is less well-known, 
so we will work with different values to assess the method’s performance. With 
respect to the size of the cell, a displacement of 3 A is significant. Consider the 
cell representing the hidden truth in Fig. 5. The distance between the front left and 
right corners is on the scale of 3 A. Consequently a standard deviation of 0.5 for the 
additive noise represents a significant displacement of the atoms. 
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Table 1 £&(@) registration errors 


Standard deviation Percent observed | Registration error 

0.0 75% 3.49368609883352e—11 
0.0 45% 4.40071892313178e—11 
0.25 75% | 0.170252964995 1198 
0.25 45% 0.1221555853433331 
0.5 715% 0.34456843287351 14 
0.5, 45% 0.3643178111314804 


As a visual example, the images in Fig.5 are our synthetic test data used to 
simulate the noise and missing data from the APT datasets. The leftmost image in 
Fig. 5 is the hidden truth we seek to uncover. The middle image is the first with noise 
added to the atom positions. Lastly, in the right-most image we have “ghosted’ some 
atoms, by coloring them grey, to give a better visual representation of the missing 
data. In these representations of HEAs, a color different from grey denotes a distinct 
type of atom. What we seek is to infer the chemical ordering and atomic structure 
of the left image, from transformed versions of the right, where y = 0.5. 

For our initial numerical experiments with simulated APT data, we choose a 
single reference and observation, and consider two different percentages of observed 
data, 75% and 45%. For both levels of observations in the data, we looked at 
results with three different levels of added noise on the atomic positions: no noise, 
and Gaussian noise with standard deviation of 0.25 and 0.5. The MSE of the 
processes are shown in Table 1. We initially observe the method is able, within 
an appreciably small tolerance, find the exact parameter @ in the case of no noise, 
with both percentages of observed data. In the other cases, as expected, the error 
scales with the noise. This follows from our model, as we are considering a rigid 
transformation between the observation and reference, which is a volume preserving 
transformation. If the exact transformation is used with an infinite number of points, 
then the RMSE (square root of Eq. (13)) is y. 

Now we make the simplifying assumption that the entire configuration corre- 
sponds to the same reference, and each observation in the configuration corresponds 
to the same transformation applied to the reference, with independent, identically 
distributed (i.i.d.) noise added to it. This enables us to approximate the mean and 
variance of Eq. (13) over these observation realizations, i.e. we obtain a collection 
{é alae of errors, where &'(6') is the MSE corresponding to replacing Y! and 
its estimated registration parameters 6! into Eq.(13), where L is the total number 
of completed registrations. The statistics of this collection of values provide robust 
estimates of the expected error for a single such registration, and the variance we 
can expect over realizations of the observational noise. In other words 


L L 
+L eit Leal L a Lyply _ wl 2 
EO =F = (6') and V"8@) := 5 Ls (6') &(6))?. (14) 


Bayesian Point Set Registration 111 


1.0 


0.74 
- 0.9 
0.6 4 
Bosd aes 
3 is 
5 Ea 
z _~ 2 
A 0.44 + 0.7 S 
= 2 
g 034 40.6 9 
5 2 
a 0.24 
- 0.5 
0.14 
— Percent Observed = 100 | 9 4 
— Standard Dev = 0.00 : 


0.00 005 010 O15 020 025 030 035 040 
€(@) Registration Error 


Fig. 6 Blue: Full data, Red: Noiseless data 
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Fig. 7 Blue: 90% Observed, Red: y = 0.1 


We have confidence intervals as well, corresponding to a central limit theorem 
approximation based on these L samples. 

In Figs. 6, 7, 8, and 9 we computed the registration for L = 125 1.i.d. observation 
sets corresponding to the same reference, for each combination of noise and percent 
observed data. We then averaged all 125 registration errors for a fixed noise/percent 
observed combination, as in Eq. (14), and compared the values. What we observe in 
Figs. 6, 7, 8, and 9 is the registration error scaling with the noise, which is expected. 
What is interesting to note here is that the registration error is essentially constant 
with respect to the percentage of observed data, for a fixed standard deviation 
of the noise. More information will lead to a lower variance in the posterior on 
the transformation 0, following from standard statistical intuition. However, the 


112 A. Spannaus et al. 


1:0) 
0.74 
-0.9 
0.64 
S054 eer 
= S 
3 0.4 0.7 z 
a =| me f iS 
e084 +0.6 © 
3 é 
02 
-0.5 
eT P Ot 1 
— Percent Observed = 75 
— Standard Dev = 0.25 || 9-4 
0.0 


0.00 0.05 0.10 0.15 0.20 025 030 035 0.40 
E(0) Registration Error 


Fig. 8 Blue: 75% Observed, Red: y = 0.25 
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Fig. 9 Blue: 50% Observed, Red: y = 0.5 


important point to note is that, as mentioned above, for exact transformation, and 
infinite points, (13) will equal y?. So, for sufficiently accurate transformation, one 
can expect a sample approximation thereof. Sufficient accuracy is found here with 
very few observed points, which is reasonable considering that in the zero noise case 
2 points is sufficient to fit the 6 parameters exactly. 

The MSE registration errors shown in Figs. 6, 7, 8, and 9, show the error remains 
essentially constant with respect to the percent observed. Consequently, if we 
consider only Fig.7, we observe that the blue and red lines intersect, when the 
blue has a standard deviation of 0.1, and the associated MSE is approximately 0.05. 
This same error estimate holds for all tested percentages of observed data having a 
standard deviation of 0.1. Similar results hold for other combinations of noise and 
percent observed, when the noise is fixed. 
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Fig. 10 Blue: Full data, Red: Noiseless data (MALA) 
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Fig. 11 Blue: 90% Observed, Red: y = 0.1 (MALA) 


Furthermore, the results shown in Figs.6, 7, 8, and 9 are independent of the 
algorithm, as the plots in Figs. 10 and 11 show. For the latter, we ran a similar 
experiment with 125 1.i.d. observation sets, but to compute the registration, we used 
the Metropolis Adjusted Langevin Algorithm (MALA) [15], as opposed to HMC in 
Figs. 6, 7, 8, and 9. Both algorithms solve the same problem and use information 
from the gradient of the log density. In the plots shown in Figs.6, 7, 8, and 9, 
we see the same constant error with respect to the percent observed and the error 
increasing with the noise, for a fixed percent observed. The MSE also appears to be 
proportional to y*, which is expected, until some saturation threshold of y > 0.5 
or so. This can be understood as a threshold beyond which the observed points will 
tend to get assigned to the wrong reference point. 
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To examine the contours of our posterior described by Eq. (8), we drew 10° 
samples from the density using the HMC methodology described previously. For 
this simulation we set the noise to have standard deviation of 0.25 and the percent 
observed was 35%, similar values to what we expect from real APT datasets. 
The rotation matrix R is constructed via Euler angles denoted: 9,, gy, g;, where 
gx € [0,27), yy € [—F, 5] and g, € [0, 27). These parameters are especially 
important to making the correct atomic identification, which is crucial to the success 
of our method. 

In Figs. 12, 13, and 14, we present marginal single variable histograms and 
all combinations of marginal two-variable joint histograms for the individual 
components of 6. We observe multiple modes in a number of the marginals. In 
Figs. 15, 16, 17, 18, 19, and 20 we present autocorrelation and trace plots for the 
rotation parameters from the same instance of the HMC algorithm as presented in 
the histograms above in Figs. 12, 13, and 14. We focus specifically on the rotation 
angles, to ensure efficient mixing of the Markov chain as these have thus far been 
more difficult for the algorithm to optimize. We see the chain is mixing well with 
respect to these parameters and appears not to become stuck in local basins of 
attraction. 
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Fig. 12 Histograms of g parameters, 100,000 samples, y = 0.25, Observed = 35% 
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Additionally, we consider the following. Define null sets A;,..., Ay. For each 


and increment Aj«(j,4 = Aj*(j, U Y 3 This provides a distribution of registered 
points for each index 7, A;, from which we estimate various statistics such as mean 
and variance. However, note that the cardinality varies between |A;| € {0,..., L}. 
We are only be concerned with statistics around reference points i such that 
|A;| > L/10 or so, assuming that the other reference points correspond to outliers 
which were registered to by accident. Around each of these N’ < N reference points 
X;, we have a distribution of some K < L registered points. We then computed 
the mean of these K points, denoted by X; and finally we compute the MSE 


, - 
W peal 1|Xi — Xi |. The RMSE is reported in Table 2. Here we note that a lower 
percentage observed p is correlated with a larger error. Coupling correct inferences 
about spatial alignment with an ability to find distributions of atoms around each 


lattice point is a transformative tool for understanding High Entropy Alloys. 
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5 Conclusion 


We have presented a statistical model and methodology for point set registration. 
We are able to recover a good estimate of the correspondence and spatial alignment 
between point sets in R? and R? despite missing data and added noise. As a 
continuation of this work, we will extend the Bayesian framework presented in 
Sect. 2.1 to incorporate the case of an unknown reference. In such a setting, we will 
seek not only the correct spatial alignment and correspondence, but the reference 
point set, or crystal structure. The efficiency of our algorithm could be improved 
through a tempering scheme, allowing for easier transitions between modes, or an 
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adaptive HMC scheme, where the chain learns about the sample space in order to 
make more efficient moves. 

Being able to recover the alignment and correspondences with an unknown 
reference will give Materials Science researchers an unprecedented tool in making 
accurate predictions about High Entropy Alloys and allow them to develop the 
necessary tools for classical interaction potentials. Researchers working in the 
field will be able to determine the atomic level structure and chemical ordering 
of High Entropy Alloys. From such information, the Material Scientists will have 
the necessary tools to develop interaction potentials, which is crucial for molecular 
dynamics simulations and designing these complex materials. 
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Table 2 Errors for 125 completed registrations 


Standard deviation Percent observed Error 

0.25 75% 0.0490961 1134835241 
0.5 75% 0.0793453 1875006196 
0.25 45% 0.07460005923988245 
0.5 45% 0.11978598998930728 
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Optimization Methods for Inverse ®) 
Problems cha 


Nan Ye, Farbod Roosta-Khorasani, and Tiangang Cui 


Abstract Optimization plays an important role in solving many inverse problems. 
Indeed, the task of inversion often either involves or is fully cast as a solution 
of an optimization problem. In this light, the mere non-linear, non-convex, and 
large-scale nature of many of these inversions gives rise to some very challenging 
optimization problems. The inverse problem community has long been developing 
various techniques for solving such optimization tasks. However, other, seemingly 
disjoint communities, such as that of machine learning, have developed, almost in 
parallel, interesting alternative methods which might have stayed under the radar of 
the inverse problem community. In this survey, we aim to change that. In doing so, 
we first discuss current state-of-the-art optimization methods widely used in inverse 
problems. We then survey recent related advances in addressing similar challenges 
in problems faced by the machine learning community, and discuss their potential 
advantages for solving inverse problems. By highlighting the similarities among 
the optimization challenges faced by the inverse problem and the machine learning 
communities, we hope that this survey can serve as a bridge in bringing together 
these two communities and encourage cross fertilization of ideas. 


1 Introduction 


Inverse problems arise in many applications in science and engineering. The term 
“inverse problem” is generally understood as the problem of finding a specific 
physical property, or properties, of the medium under investigation, using indirect 
measurements. This is a highly important field of applied mathematics and scientific 
computing, as to a great extent, it forms the backbone of modern science and 
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engineering. Examples of inverse problems can be found in various fields within 
medical imaging [6, 7, 12, 71, 95] and several areas of geophysics including mineral 
and oil exploration [8, 18, 74, 96]. 

In general, an inverse problem aims at recovering the unknown underlying 
parameters of a physical system which produces the available observa- 
tions/measurements. Such problems are generally ill-posed [52]. This is often 
solved via two approaches: a Bayesian approach which computes a posterior 
distribution of the models given prior knowledge and the data, or a regularized data 
fitting approach which chooses an optimal model by minimizing an objective that 
takes into account both fitness to data and prior knowledge. The Bayesian approach 
can be used for a variety of downstream inference tasks, such as credible intervals 
for the parameters; it is generally more computationally expensive than the data 
fitting approach. The computational attractiveness of data fitting comes at a cost: it 
can only produce a “point” estimate of the unknown parameters. However, in many 
applications, such a point estimate can be more than adequate. 

In this review, we focus on the data fitting approach. The approach consists of the 
four building blocks: a parametric model of the underlying physical phenomenon, a 
forward solver that predicts the observation given the model parameters, an objec- 
tive function measuring how well a model fits the observation, and an optimization 
algorithm for finding model parameters optimizing the objective function. The first 
three components together conceptually defines what an optimal model is, and 
the optimization algorithm provides a computational means to find the optimal 
model (usually requires solving the forward problem during optimization). Each 
of these four building blocks is an active area of research. This paper focuses 
on the optimization algorithms. While numerous works have been done on the 
subject, there are still many challenges remaining, including scaling up to large- 
scale problems, dealing with non-convexity. On the other hand, optimization also 
constitutes a backbone of machine learning [17, 32]. Consequently, there are 
many related developments in optimization from the machine learning community. 
However, thus far and rather independently, the machine learning and the inverse 
problems communities have largely developed their own sets of tools and algorithms 
to address their respective optimization challenges. It only stands to reason that 
many of the recent advances by machine learning can be potentially applicable 
for addressing challenges in solving inverse problems. We aim to bring out this 
connection and encourage permeation of ideas across these two communities. 

In Sect. 2, we present general formulations for the inverse problem, some 
typical inverse problems, and optimization algorithms commonly used to solve 
the data fitting problem. We discuss recent advances in optimization in Sect. 3. 
We then discuss areas in which cross-fertilization of optimization and inverse 
problems can be beneficial in Sect. 4. We conclude in Sect. 5. We remark that our 
review of these recent developments focus on iterative algorithms using gradient 
and/or Hessian information to update current solution. We do not examine global 
optimization methods, such as genetic algorithms, simulated annealing, particle 
swarm optimization, which have also received increasing attention recently (e.g. 
see [98]). 
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2 Inverse Problems 


An inverse problem can be seen as the reverse process of a forward problem, which 
concerns with predicting the outcome of some measurements given a complete 
description of a physical system. Mathematically, a physical system is often 
specified using a set of model parameters m whose values completely characterize 
the system. The model space -@ is the set of possible values of m. While m usually 
arises as a function, in practice it is often discretized as a parameter vector for the 
ease of computation, typically using the finite element method, the finite volume 
method, or the finite difference method. The forward problem can be denoted as 


m— d= f(m), (1) 


where d are the error-free predictions, and the above notation is a shorthand for 
d = (dy,...,d,) = (f,(m), ..., f,(m)), with d; € R’ being the i-th measurement. 
The function f represents the physical theory used for the prediction and is called 
the forward operator. The observed outcomes contain noises and relate to the system 
via the following the observation equation 


d = f(m) + , (2) 


where y are the noises occurred in the measurements. The inverse problem aims to 
recover the model parameters m from such noisy measurements. 

The inverse problem is almost always ill-posed, because the same measurements 
can often be predicted by different models. There are two main approaches to deal 
with this issue. The Bayesian approach assumes a prior distribution P(m) on the 
model and a conditional distribution P(y | m) on noise given the model. The latter is 
equivalent to a conditional distribution P(d | m) on measurements given the model. 
Given some measurements d, a posterior distribution P(m | d) on the models is 
then computed using the Bayes rule 


P(m | d) « P(m)P(d | m). (3) 


Another approach sees the inverse problem as a data fitting problem that finds an 
parameter vector m that gives predictions f(m) that best fit the observed outcomes 
d in some sense. This is often cast as an optimization problem 


min v(m, d), (4) 


where the misfit function w measures how well the model m fits the data d. 
When there is a probabilistic model of d given m, a typical choice of w(m, d) 
is the negative log-likelihood. Regularization is often used to address the issue 
of multiple solutions, and additionally has the benefit of stabilizing the solution, 
that is, the solution is less likely to change significantly in the presence of outliers 
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[5, 36, 111]. Regularization incorporates some a priori information on m in the form 
of a regularizer R(m) and solves the regularized optimization problem 


min Wra(m,d) := wim, d) + aR(m), (5) 
mew 


where a > 0 is a constant that controls the tradeoff between prior knowledge and 
the fitness to data. The regularizer R(m) encodes a preference over the models, 
with preferred models having smaller R values. The formulation in Eq. (5) can 
often be given a maximum a posteriori (MAP) interpretation within the Bayesian 
framework [97]. Implicit regularization also exists in which there is no explicit term 
R(m) in the objective [53, 54, 86, 87, 107, 109]. 

The misfit function often has the form ¢(f(m), d), which measures the difference 
between the prediction f(m) and the observation d. For example, ¢ may be chosen to 
be the Euclidean distance between f(m) and d. In this case, the regularized problem 
takes the form 


ane Pr.a(m, d) := o(f(m), d) + wRim), (6) 


This can also be equivalently formulated as choosing the most preferred model 
satisfying constraints on its predictions 


min R(m), st. $(f(m),d) <p. (7) 


mew 


The constant p usually relates to noise and the maximum discrepancy between the 
measured and the predicted data, and can be more intuitive than a. 


2.1 PDE-Constrained Inverse Problems 


For many inverse problems in science and engineering, the forward model is not 
given explicitly via a forward operator f(m), but often conveniently specified via a 
set of partial differential equations (PDEs). For such problems, Eq. (6) has the form 


min ¢(P-u,d)+aR(m), st. ci(mjuj)=0, i=1,...,s, (8) 


me.Z,u 


where P-u = (Pi,..., Ps) + (Wy,...,Us) = (Piwy,..., PsUs) with u; being the 
field in the i-th experiment, P; being the projection operator that selects fields at 
measurement locations in d; (that is, Pjuj are the predicted values at locations 
measured in d;), and c;(m, u;) = 0 corresponds to the forward model in the i-th 
experiment. In practice, the forward model can often be written as 


Zim)juj=q, i=1,...,s, (9) 
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where -Y;(m) is a differential operator, and q; is a term that incorporates source 
terms and boundary values. 

The fields uj,...,us in Eqs.(8) and (9) are generally functions in two or 
three dimensional spaces, and finding closed-form solutions is usually not possible. 
Instead, the PDE-constrained inverse problem is often solved numerically by 
discretizing Eqs.(8) and (9) using the finite element method, the finite volume 
method, or the finite difference method. Often the discretized PDE-constrained 
inverse problem takes the form 


min (Pu,d)+aR(m), st. Li(mu;=q, i=1,...,s, (10) 


me.Z,u 


where P is a block-diagonal matrix consisting of diagonal blocks P,,..., Ps 
representing the discretized projection operators, u is the concatenation of the 
vectors u;,..., Us representing the discretized fields, and each L;(m) is a square, 
non-singular matrix representing the differential operator %(m). Each L;(m) is 
typically large and sparse. We abuse the notations P, u to represent both functions 
and their discretized versions, but the meanings of these notations will be clear from 
context. 

The constrained problem in Eq. (10) can be written in an unconstrained form by 
eliminating u using u; = Ly ‘qi. 


min (PL !(m)q,d) + aR(m), (11) 
me4@ 
where L is the block-diagonal matrix with L;,..., Ls as the diagonal blocks, and 
q is the concatenation of qi,..., qs. Note that, as in the case of (6), here we have 


f(m) = PL~'(m)q. 

Both the constrained and unconstrained formulations are used in practice. The 
constrained formulation can be solved using the method of Lagrangian multipliers. 
This does not require explicitly solving the forward problem as in the unconstrained 
formulation. However, the problem size increases, and the problem becomes one of 
finding a saddle point of the Lagrangian, instead of finding a minimum as in the 
constrained formulation. 


2.2 Image Reconstruction 


Image reconstruction studies the creation of 2-D and 3-D images from sets of 
1-D projections. The 1-D projections are generally line integrals of a function 
representing the image to be reconstructed. In the 2-D case, given an image function 
J (x, y), the integral along the line at a distance of s away from the origin and having 
a normal which forms an angle ¢@ with the x-axis is given by the Radon transform 


p(s, o) = [- f(zsing@+scos¢, —zcosé+ssing)dz. (12) 
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Reconstruction is often done via back projection, filtered back projection, or iter- 
ative methods [55, 77]. Back projection is the simplest but often results in a blurred 
reconstruction. Filtered back projection (FBP) is the analytical inversion of the 
Radon transform and generally yields reconstructions of much better quality than 
back projection. However, FBP may be infeasible in the presence of discontinuities 
or noise. Iterative methods take noise into account, by assuming a distribution for 
the noise. The objective function is often chosen to be a regularized likelihood of the 
observation, which is then iteratively optimized using the expectation maximization 
(EM) algorithm. 


2.3. Objective Function 


One of the most commonly used objective function is the least squares criterion, 
which uses a quadratic loss and a quadratic regularizer. Assume that the noise 
for each experiment in (2) is independently but normally distributed, i.e., 7; 
AV (0, X;), Vi, where XY; € R!*! is the covariance matrix. Let © be the block- 
diagonal matrix with X,..., 2’; as the diagonal blocks. The standard maximum 
likelihood (ML) approach [97], leads to minimizing the least squares (LS) misfit 
function 


¢(m) := ||f(m) — dll5_1, (13) 
where the norm ||x||4 = <x!Ax is a generalization of the Euclidean norm 


(assuming the matrix A is positive definite, which is true in the case of Z-"). In the 
above equation, we simply write the general misfit function ¢(f(m), d) as ¢(m) by 
taking the measurements d as fixed and omitting it from the notation. As previously 
discussed, we often minimize a regularized misfit function 


PR,a(m) := P(m) + wR). (14) 


The prior R(m) is often chosen as a Gaussian regularizer R(m) = (m — 
Mprior) | L,, ! (mM — Mprior). We can also write the above optimization problem as 
minimizing R(m) under the constraints 


Y= [fi @m) — di|| < p. (15) 


i=1 


The least-squares criterion belongs to the class of €)-norm criteria, which 
contain two other commonly used criteria: the least-absolute-values criterion and 
the minimax criterion [104]. These correspond to the use of the €;-norm and the 
£oo-norm for the misfit function, while the least squares criterion uses the £2-norm. 
Specifically, the least-absolute-values criterion takes (m) := ||f(m) — d||;, and the 
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minimax criterion takes @(m) := ||f(m) — d|loo. More generally, each coordinate 
in the difference may be weighted. The @; solution is more robust (that is, less 
sensitive to outliers) than the £2 solution, which is in turn more robust than the €.,5 
solution [25]. The £45 norm is desirable when outliers are uncommon but the data 
are corrupted by uniform noise such as the quantization errors [26]. 

Besides the €2 regularizer discussed above, the ¢,-norm is often used too. 
The €; regularizer induces sparsity in the model parameters, that is, heavier ¢ 
regularization leads to fewer non-zero model parameters. 


2.4 Optimization Algorithms 


Various optimization techniques can be used to solve the regularized data fitting 
problem. We focus on iterative algorithms for nonlinear optimization below as 
the objective functions are generally nonlinear. In some cases, the optimization 
problem can be transformed to a linear program. For example, linear programming 
can be used to solve the least-absolute-values criterion or the minimax criterion. 
However, linear programming are considered to have no advantage over gradient- 
based methods (see Section 4.4.2 in [104]), and thus we do not discuss such methods 
here. Nevertheless, there are still many optimization algorithms that can be covered 
here, and we refer the readers to [13, 80]. 

For simplicity of presentation, we consider the problem of minimizing a function 
g(m). We consider iterative algorithms which start with an iterate mp, and compute 
new iterates using 


myy1 = my + AK PE, (16) 


where px is a search direction, and Ax a step size. Unless otherwise stated, we 
focus on unconstrained optimization. These algorithms can be used to directly solve 
the inverse problem in Eq. (5). We only present a selected subset of the algorithms 
available and have to omit many other interesting algorithms. 


Newton-Type Methods The classical Newton’s method starts with an initial iterate 
mo, and computes new iterates using 


-1 
mei =m —(V°g@m)) Vein), a7) 


that is, the search direction is py = — (V2¢(m,)) | V g(m,), and the step length 
is Ax = 1. The basic Newton’s method has quadratic local convergence rate at a 
small neighborhood of a local minimum. However, computing the search direction 
Pk can be very expensive, and thus many variants have been developed. In addition, 
in non-convex problems, classical Newton direction might not exist (if the Hessian 
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matrix is not invertible) or it might not be an appropriate direction for descent (if 
the Hessian matrix is not positive definite). 

For non-linear least squares problems, where the objective function g(m) is a 
sum of squares of nonlinear functions, the Gauss-Newton (GN) method is often 
used [101]. Extensions to more general objective functions as in Eq.(13) with 
covariance matrix »’ and arbitrary regularization as in Eq. (14) is considered in [94]. 
Without loss of generality, assume g(m) = > , (f; (m) — d;)?. At iteration k, the 
GN search direction p; is given by 


(> ia) Pr = —Ve, (18) 
i=1 


where the sensitivity matrix J; and the gradient Vg are given by 


of; . 
Ji = —(m), i= 1,...,5, (19) 
om 
Ss 
Vg =2)) Jf (f(g) — d;), (20) 


i=1 


The Gauss-Newton method can be seen as an approximation of the basic Newton’s 
method obtained by replacing Vg by ae ae: J;. The step length A, € [0, 1] can 
be determined by a weak line search [80] (using, say, the Armijo algorithm starting 
with A, = 1) ensuring sufficient decrease in g(m,z+1) as compared to g(mgx). 

Often several nontrivial modifications are required to adapt this prototype method 
for different applications, e.g., dynamic regularization [53, 86, 87, 108] and more 
general stabilized GN studied [30, 93]. This method replaces the solution of the 
linear systems defining px by r preconditioned conjugate gradient (PCG) inner 
iterations, which costs 2r solutions of the forward problem per iteration, for a 
moderate integer value r. Thus, if K outer iterations are required to obtain an 
acceptable solution then the total work estimate (in terms of the number of PDE 
solves) is approximated from below by 2(r + 1)Ks. 

Though Gauss-Newton is arguable the method of choice within the inverse 
problem community, other Newton-type methods exist which have been designed 
to suitably deal with the non-convex nature of the underlying optimization problem 
include Trust Region [27, 114] and the Cubic Regularization [23, 114]. These 
methods have recently found applications in machine learning [115]. Studying 
the advantages/disadvantages of these non-convex methods for solving inverse 
problems can be indeed a useful undertaking. 


Quasi-Newton Methods An alternative method to the above Newton-type meth- 
ods is the quasi-Newton variants including the celebrated limited memory BFGS 
(L-BFGS) [68, 79]. BFGS iteration is closely related to conjugate gradient (CG) 
iteration. In particular, BFGS applied to a strongly convex quadratic objective, with 
exact line search as well as initial Hessian P, is equivalent to preconditioned CG 
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with preconditioner P. However, as the objective function departs from being a 
simple quadratic, the number of iterations of L-BFGS could be significantly higher 
than that of GN or trust region. In addition, it has been shown that the performance 
of BFGS and its limited memory version is greatly negatively affected by the 
high degree if ill-conditioning present in such problems [90, 91, 113]. These two 
factor are among the main reasons why BFGS (and L-BFGS) can be less effective 
compared with other Newton-type alternatives in many inversion applications [44]. 


Krylov Subspace Method A Krylov subspace method iteratively finds the optimal 
solution to an optimization in a larger subspace by making use of the previous 
solution in a smaller subspace. One of the most commonly used Krylov subspace 
method is the conjugate gradient (CG) method. CG was originally designed to solve 
convex quadratic minimization problems of the form g(m) = sm! Am —b'm. 
Equivalently, this solves the positive definite linear system Am = b. It computes a 
sequence of iterates mo, mj, ... converging to the minimum through the following 
two set of equations. 


mo = 0, ro = b, Po = 10, (21) 
IIricl3 IIricl3 ree all3 

m4) =m + —> 2 ky Tht = Tk aT 2 APK,  Pk+l = Tee + = Kk, kz. 
Py APk DP; APk rells 

(22) 


This can be used to solve the forward problem of the form L;(m)u; = q;, provided 
that L;(m) is positive definite, which is true in many cases. 

CG can be used to solve the linear system for the basic Newton direction. 
However, the Hessian is not necessarily positive definite and modification is needed 
[80]. 

In general, CG can be generalized to minimize a nonlinear function g(m) [28, 


39]. It starts with an arbitrary mo, and p) = —Vg(mo), and computes a sequence 
of iterates m;, mo, ... using the equations below: for k > 0, 
m+) = arg g(m), (23) 


min 
me{my+Apx, AER} 


IV g@me+1)I 15 
lV g(m,)||5 


Pro = —Vg(mg41) + Be Pr, where By = (24) 


The above formula for 6; is known as the Fletcher-Reeves formula. Other choices of 
Bx exist. The following two formula are known as the Polak-Ribiere and Hestenes- 
Stiefel formula respectively. 


(Vg(mg+1) — Ve(m,), Vg (mg+1)) 
ee ee (25) 
ie lV gam, [3 
bee (Vg(mz+1) — Vg(mx), VgGme+1)) (26) 


Py (Vg (mp1) — Vg(mg)) 
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In practice, nonlinear CG does not seem to work well, and is mainly used together 
with other methods, such as in the Newton CG method [80]. 


Lagrangian Method of Multipliers The above discussion focuses on uncon- 
strained optimization algorithms, which are suitable for unconstrained formulations 
of inverse problems, or unconstrained auxiliary optimization problems in methods 
which solves the constrained formulations directly. The Lagrangian method of 
multipliers is often used to directly solve the constrained version. Algorithms have 
been developed to offset the heavier computational cost and slow convergence rates 
of standard algorithms observed on the Lagrangian, which is a larger problem than 
the constrained problem. For example, such algorithm may reduce the problem to 
a smaller one, such as working with the reduced Hessian of the Lagrangian [47], 
or preconditioning [10, 45]. These methods have shown some success in certain 
PDE-constrained optimization problems. 

Augmented Lagrangian methods have also been developed (e.g. [1, 57]). Such 
method constructs a series of penalized Lagrangians with vanishing penalty, and 
finds an optimizer of the Lagrangian by successively optimizing the penalized 
Lagrangians. 


2.5 Challenges 


Scaling up to Large Problems The discretized version of an inverse problem is 
usually of very large scale, and working with fine resolution or discretized problems 
in high dimension is still an active area of research. 

Another challenge is to scale up to large number of measurements, which is 
widely believed to be helpful for quality reconstruction of the model in practice, 
with some theoretical support. While recent technological advances makes many 
big datasets available, existing algorithms cannot efficiently cope with such datasets. 
Examples of such problems include electromagnetic data inversion in mining 
exploration [33, 48, 78, 81], seismic data inversion in oil exploration [38, 56, 88], 
diffuse optical tomography (DOT) [6, 14], quantitative photo-acoustic tomography 
(QPAT) [42, 117], direct current (DC) resistivity [30, 49, 50, 83, 100], and electrical 
impedance tomography (EIT) [16, 24, 110]. 

It has been suggested that many well-placed experiments yield practical advan- 
tage in order to obtain reconstructions of acceptable quality. For the special case 
where the measurement locations as well as the discretization matrices do not 
change from one experiment to another, various approximation techniques have 
been proposed to reduce the effective number of measurements, which in turn 
implies a smaller scale optimization problem, under the unifying category of 
“simultaneous sources inversion” [46, 63, 89, 93, 94]. Under certain circumstances, 
even if the P;’s are different across experiments (but L;’s are fixed), there are 
methods to transform the existing data set into the one where all sources share the 
same receivers, [92]. 
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Dealing with Non-convexity Another major source of difficulty in solving many 
inverse problems, is the high-degree of non-linearity and non-convexity in (1). This 
is most often encountered in problems involving PDE-constrained optimization 
where each f; corresponds to the solution of a PDE. Even if the output of 
the PDE model itself, i.e., the “right-hand side”, is linear in the sought-after 
parameter, the solution of the PDE, i.e., the forward problem, shows a great deal 
of non-linearity. This coupled with a great amount of non-convexity can have 
significant consequences in the quality of inversion and the obtained parameter. 
Indeed, in presence of non-convexity, the large-scale computational challenges are 
exacerbated, multiple folds over, by the difficulty of avoiding (possibly degenerate) 
saddle-points as well as finding (at least) a local minimum. 


Dealing with Discontinuity While the parameter function of the model is often 
smooth, the parameter function can be discontinuous in some cases. Such discon- 
tinuities arise very naturally as a result of the physical properties of the underlying 
physical system, e.g., EIT and DC resistivity, and require non-trivial modifications 
to optimization algorithms, e.g., [30, 93]. Ignoring such discontinuities can lead 
to unsatisfactory recovery results [30, 31, 103]. The level set method [82] is 
often used to model discontinuous parameter function. This reparametrizes the 
discontinuous parameter function as a differentiable one, and thus enabling more 
stable optimization [31]. 


3 Recent Advances in Optimization 


Recent successes in using machine learning to deal with challenging perception and 
natural language understanding problems have spurred many advances in the study 
of optimization algorithms as optimization is a building block in machine learning. 
These new developments include efficient methods for large-scale optimization, 
methods designed to handle non-convex problems, methods incorporating the 
structural constraints, and finally the revival of second-order methods. While these 
developments address a different set of applications in machine learning, they 
address similar issues as encountered in inverse optimization and could be useful. 
We highlight some of the works below. We keep the discussion brief because 
numerous works have been done behind these developments and an indepth and 
comprehensive discussion is beyond the scope of this review. Our objective is thus to 
delineate the general trends and ideas, and provide references for interested readers 
to dig on relevant topics. 


Stochastic Optimization The development in large-scale optimization methods is 
driven by the availability of many large datasets, which are made possible by the 
rapid development and extensive use of computers and information technology. In 
machine learning, a model is generally built by optimizing a sum of misfit on the 
examples. This finite-sum structure naturally invites the application of stochastic 
optimization algorithms. This is mainly due to the fact that stochastic algorithms 
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recover the sought-after models more efficiently by employing small batches of data 
in each iteration, as opposed to the whole data-set. The most well-known stochastic 
gradient based algorithm is the stochastic gradient descent (SGD). To minimize a 
finite-sum objective function 


1 n 
g(m) = — d, gi(m), (27) 


in the big data regime where n > 1, the vanilla SGD performs an update 
m4. = my — Ax V gi, (Mk), (28) 


where ix is randomly sampled from 1,...,n. As compared to gradient descent, 
SGD replaces the full gradient Vg(m) by a stochastic gradient g;,(m,) with its 
expectation being the full gradient. The batch version of SGD constructs a stochastic 
gradient by taking the average of several stochastic gradients. 

Vanilla SGD is inexpensive per iteration, but suffers from a slow rate of 
convergence. For example, while full gradient descent achieves a linear convergence 
rate for smooth strongly convex problems, SGD only converges at a sublinear 
rate. The slow convergence rate can be partly accounted by the variance in the 
stochastic gradient. Recently, variance reduction techniques have been developed, 
e.g. SVRG [61] and SDCA [99]. Perhaps surprisingly, such variants can achieve 
linear convergence rates on convex smooth problems as full gradient descent does, 
instead of sublinear rates achieved by the vanilla SGD. There are also a number of 
variants with no known linear rates but have fast convergence rates for non-convex 
problems in practice, e.g., AdaGrad [34], RMSProp [105], ESGD [29], Adam [62], 
and Adadelta [118]. Indeed, besides efficiency, stochastic optimization algorithms 
also seem to be able to cope with the nonconvex objective functions well, and play 
a key role in the revival of neural networks as deep learning [43, 60, 66]. 

Recently, it has also been shown that SGD can be used as a variational algorithm 
for computing the posterior distribution of parameters given observations [72]. This 
can be useful in the Bayesian approach for solving inverse problems. 


Nonconvex Optimization There is also an increasing interest in non-convex 
optimization in the machine learning community recently. Nonconvex objectives 
not only naturally occur in deep learning, but also occur in problems such as tensor 
decomposition, variable selection, low-rank matrix completion, e.g. see [43, 59, 73] 
and references therein. 

As discussed above, stochastic algorithms have been found to be capable of 
effectively escaping local minima. There are also a number of studies which 
adapt well-known acceleration techniques for convex optimization to accelerate the 
convergence rates of both stochastic and non-stochastic optimization algorithms for 
nonconvex problems, e.g., [4, 67, 85, 102]. 
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Dealing with Structural Constraints Many problems in machine learning come 
with complex structural constraints. The Frank-Wolfe algorithm (a.k.a. conditional 
gradient) [41] is an algorithm for optimizing over a convex domain. It has gained 
a revived interest due to its ability to deal with many structural constraints 
efficiently. It requires solving a linear minimization problem over the feasible set, 
instead of a quadratic program as in the case of proximal gradient algorithms or 
projected gradient descent. Domains suitable for the Frank-Wolfe algorithm include 
simplices, ¢»-balls, matrix nuclear norm ball, matrix operator norm ball [58]. 

The Frank-Wolfe algorithm belongs to the class of linear-optimization-based 
algorithms [64, 65]. These algorithms share with the Frank-Wolfe algorithm the 
characteristic of requiring a first-order oracle for gradient computation and an oracle 
for solving a linear optimization problem over the constraint set. 


Second-Order Methods The great appeal of the second-order methods lies mainly 
in the observed empirical performance as well as some very appealing theoretical 
properties. For example, it has been shown that stochastic Newton-type methods 
in general, and Gauss-Newton in particular, can not only be made scalable and 
have low per-iteration cost [30, 47, 51, 92-94], but more importantly, and unlike 
first-order methods, are very resilient to many adversarial effects such as ill- 
conditioning [90, 91, 113]. As a result, for moderately to very ill-conditioned 
problems, commonly found in scientific computing, while first-order methods make 
effectively no progress at all, second-order counterparts are not affected by the 
degree of ill-conditioning. A more subtle, yet potentially more severe draw-back in 
using first-order methods, is that their success is tightly intertwined with fine-tuning 
(often many) hyper-parameters, most importantly, the step-size [11]. In fact, it is 
highly unlikely that many of these methods exhibit acceptable performance on first 
try, and it often takes many trials and errors before one can see reasonable results. In 
contrast, second-order optimization algorithms involve much less parameter tuning 
and are less sensitive to the choice of hyper-parameters [11, 115]. 

Since for the finite-sum problem (27) with n > 1, the operations with the 
Hessian/gradient constitute major computational bottlenecks, a rather more recent 
line of research is to construct the inexact Hessian information using the applica- 
tion of randomized methods. Specifically, for convex optimization, the stochastic 
approximation of the full Hessian matrix in the classical Newton’s method has been 
recently considered in [3, 11, 15, 19, 20, 35, 37, 75, 76, 84, 90, 91, 112, 113, 116]. In 
addition to inexact Hessian, a few of these methods study the fully stochastic case in 
which the gradient is also approximated, e.g., [15, 90, 91]. For non-convex problems, 
however, the literature on methods that employ randomized Hessian approximation 
is significantly less developed than that of convex problems. A few recent examples 
include the stochastic trust region [114], stochastic cubic regularization [106, 114], 
and noisy negative curvature method [69]. Empirical performance of many of these 
methods for some non-convex machine learning applications has been considered 
in [115]. 
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3.1 A Concrete Success Story 


The development of optimization methods in the machine learning community 
has been fueled by the need to obtain better generalization performance one 
future “unseen” data. This is in contrast with typical inverse problem applications 
where fitting the model to the observations on hand make up of all that matters. 
These rather strikingly different goals have lead the ML community to develop 
optimization methods that can address ML specific challenges. This, in part, has 
given rise to scalable algorithms that can often deliver far beyond what the most 
widely used optimization methods in the inverse problem community can. 

As a concrete example, consider L-BFGS and Gauss-Newton, which are, 
arguably, among the most popular optimization techniques used by the scientific 
computing community in a variety of inverse problem applications. In fact, unlike 
Gauss-Newton method, L-BFGS, due to its low per-iteration costs, has found 
significant attraction within the machine learning community as well. Nevertheless, 
due to the resurgence of non-convex deep learning problems in ML, there is an 
increasing demand for scalable optimization algorithms that can avoid saddle 
points and converge to a local minimum. This demand has driven the development 
algorithms that can surpass the performance of L-BFGS and Gauss-Newton when 
applied to deep learning applications, e.g., [115]. 

These results are not unexpected. Indeed, contrary to popular belief, BFGS 
is not quite a “full-fledged” second-order method as it merely employs first- 
order information, i.e. gradients, to approximate the curvature. Similar in spirit, 
Gauss-Newton also does not fully utilize the Hessian information. In particular, in 
exchange for obtaining a positive definite approximation matrix, GN completely 
ignores the information from negative curvature, which is critical for allowing to 
escape from regions with small gradient. Escaping saddle points and converging 
to a local minimum with lower objective values have surprisingly not been a huge 
concern for the inverse problem community. This is in sharp contrast to the machine 
learning applications where obtaining lower training errors with deep learning 
models typically translates to better generalization performance. 


4 Discussion 


Optimization is not only used in the data fitting approach to inverse problems, but 
also used in the Bayesian approach. An important problem in the Bayesian approach 
is the choice of the parameters for the prior. While these were often chosen in a 
somewhat ad hoc way, there are studies which use sampling [2, 40], hierarchical 
prior models [21, 22], and optimization [9, 70] methods to choose the parameters. 
While choosing the prior parameters through optimization has found some success, 
such optimization is hard and it remains a challenge to develop effective algorithms 
to solve these problems. 
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For inverse problems with large number of measurements, solving each forward 
problem can be expensive, and the mere evaluation of the misfit function may 
become computationally prohibitive. Stochastic optimization algorithms might be 
beneficial in this case, because the objective function is often a sum of misfits over 
different measurements. 

The data fitting problem is generally non-convex and thus optimization algo- 
rithms may be trapped in a local optimum. Stochastic optimization algorithms 
also provide a means to escape the local optima. Recent results in nonconvex 
optimization, such as those on accelerated methods, may provide more efficient 
alternatives to solve the data fitting problem. 

While box constraints are often used in inverse problems because they are easier 
to deal with, simplex constraint can be beneficial. The Frank-Wolfe algorithm 
provides a efficient way to deal with the simplex constraint, and can be a useful 
tool to add on to the toolbox of an inverse problem researcher. 


5 Conclusion 


State-of-the-art optimization methods in the inverse problem community struggle to 
cope with important issues such as large-scale problems and nonconvexity. At the 
same time, many progresses in optimization have been made in the machine learning 
community. Our discussion on the connections has been brief. Nevertheless, we 
have highlighted the valuable potential synergies that are to be reaped by bringing 
these two communities closer together. 
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Diagonal Form Factors ®) 
from Non-diagonal Ones od 


Zoltan Bajnok and Chao Wu 


Abstract We prove the asymptotic large volume expression of diagonal form 
factors in integrable models by evaluating carefully the diagonal limit of a non- 
diagonal form factor in which we send the rapidity of the extra particle to infinity. 


1 Introduction 


Two dimensional integrable quantum field theories are useful toy models of 
statistical and particle physics as they provide many interesting observables, which 
can be calculated exactly [12]. These models are first solved in infinite volume, 
where the scattering matrix [4, 21], which connects asymptotic multiparticle states, 
are determined together with the form factors which are the matrix elements of local 
operators sandwiched between the same asymptotic states [19]. These form factors 
then can be used to build up the correlation functions, which define the theory in the 
Wightman sense [1]. 

In the relevant practical applications, however, quantum field theories are con- 
fined to a finite volume and the calculation of finite size corrections is unavoidable. 
Fortunately, all these finite size corrections can be expressed in terms of the 
infinite volume characteristics, such as masses, scattering matrices and form factors 
[10, 11, 15]. We can distinguish three domains in the volume according to the nature 
of the corrections. The leading finite size corrections are polynomial in the inverse 
power of the volume, while the sub-leading corrections are exponentially volume- 
suppressed. 

Concerning the finite volume energy spectrum the domain when only polynomial 
corrections are kept is called the Bethe-Yang (BY) domain. We there merely 
need to take into account the finite volume quantization of the momenta, which 
originates from the periodicity requirement and explicitly includes the scattering 
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phase-shifts [11]. The exponentially small corrections are due to virtual particles 
traveling around the world and the domain in which we keep only the leading 
exponential correction is called the Luscher domain [10]. In a small volume, when 
all exponentials contribute the same way, we have to sum them up leading to a 
description given by the Thermodynamic Bethe Ansatz (TBA) [20]. 

The situation for the form factors are not understood at the same level yet. The 
BY domain was investigated in [15, 16]. It was proven for non-diagonal form factors 
that all polynomial finite size effects come only from the finite volume (Kronecker- 
delta) normalization of states. The authors also conjectured the BY form of diagonal 
finite volume form factors, which they derived for two particle-states. The leading 
exponential finite size corrections for generic form factors are not known, except 
for the diagonal ones, for which exact conjectures exist. The LeClair-Mussardo 
(LM) conjecture expresses the exact finite volume/temperature one-point functions 
in terms of infinite volume diagonal connected form factors, and densities of mirror 
states determined by the TBA equation [9]. Actually it was shown in [13, 14] that 
the BY form of diagonal form factors implies the LM formula and vice versa. 
Using analytical continuation a ’la [5] Pozsgay extended the LM formula for finite 
volume diagonal matrix elements [17]. The aim of the present paper is to prove the 
conjectured BY form of diagonal form factors [16, 18] from the already proven non- 
diagonal BY form factors [15] by carefully calculating the diagonal limit, in which 
we send one particle’s rapidity to infinity. By this way our result also leads to the 
proof of the LM formula. Here we focus on theories with one type of particles. 

The paper is organized such that in the next section we summarize the known 
facts about the BY form of diagonal and non-diagonal form factors. We then in 
Sect. 3 prove the diagonal conjecture and conclude in Sect. 4. 


2 The Conjecture for Diagonal Large Volume Form Factors 
In this section we introduce the infinite volume form factors and their properties and 
use them later on to describe the finite volume form factors in the BY domain. 


2.1 Infinite Volume Form Factors 


Infinite volume form factors are the matrix elements of local operators sandwiched 
between asymptotic states (6/,...,0/,|@|On,...,01). We use the rapidity 6 to 
parametrize the momenta as p = m sinh@. The crossing formula 


(01, ...,05|@|On,.-.,01) = (01, ...,6%, |O18,, — ie, On, ..., 01) + (1) 


n n 
y 5 275(0), — 61) |] Sj — 61), ..-, O11 OlOn—1, «5 1) 
i=1 j=itl 
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can be used to express every matrix element in terms of the elementary form factors 


(O|@|@n,.-.,01) = Fa(On,.--, 1) (2) 
where 6 = @ +iz denotes the crossed rapidity and the two particle S-matrix satisfies 
S(0) = S(ix — 6) = S(—6)~!. Infinite volume states are normalized to Dirac 
6-functions: as (6’|9) = 276(6 — 0’). The elementary form factor satisfies the 


permutation and periodicity axiom 


F,(@1, 62, 2225 Oj, O41 +25 On) _ S(6; — 641) Fn (A, 02, e+, O41, Oj + +5 On) 
= F,(02,..., 03, O41---, On, 01 — 217) (3) 


together with the kinematical singularity relation 


n 
—iResg'=o Fn42(6' +i, 0,01, ..-, On) = a-[] S(O—6;))Fn(O1,---,9n) (A) 


i=1 


For scalar operators, when properly normalized, the form factor also satisfies the 
cluster property 


a. Frim@1 SpA a On + A, On41, tees On+m) = Fy, (1, : +n) Fim (@n41; tees On4m) 
(5) 


which will be used to analyze the diagonal limit of (0, ae ces or |Q|On,..., 1) via 
6 — oo in finite volume. 

The diagonal form factors (01,...,n|@|On,..., 1) are singular due to the 6(0) 
terms coming from the normalization of the states and also from poles related to 
the kinematical singularity axiom. Actually, Foy, (0; +€1,.2., On FE, Ons ees 6) is 
not singular when all €; go to zero simultaneously, but depends on the direction of 
the limit. The connected diagonal form factor is defined as the finite €-independent 
part: 


Fy, (1; de J Ox) = Fp (Fan + €1, de | 6n ae En, On, see 61)) (6) 
while the symmetric evaluation is simply 


Fe Ciyiicg Qe) = lim Fan (1 +6€,...,0, +€,On,..., 1) (7) 
E> 


where Fp means the finite part, in order to understand the singularity structure of the 
diagonal limit we note that the singular part can very nicely be visualized by graphs 
[16]: 


Fon, + €14-6-59n + €ns Ons--,01) = = D> F(graph) + O(G;) (8) 


allowed graphs 
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where an allowed graph is an oriented tree-like (no-loop) graph in which at each 
vertex there is at most one outgoing edge. The contribution of a graph, F (graph), 
can be evaluated as follows: points (i), ..., ix) with no outgoing edges contribute 
a factor, Fy, (G;,,-.-,9i,), While for each edge from i to j we associate a factor 


(6; — 6), where (0) = —id, log S(@) = —i ae We recall the proof of (8) 
from [16] as similar argumentations will be used later on. The proof goes in 
induction in n and evaluates the residue at €, = 0 keeping all other €s finite. Clearly 
such singular term can come only from graphs in which n has only an outgoing edge 


and no incoming ones. The contributions of such terms are 


1 = = 
e, (€1Oin +++ + €n—16n—in) Fon—2(01 + €1, ~~~, On—1 + n—1, On—-1s «+ +5 91) 
n 
(9) 


where jk = oxj = $(6; — 9;). Now comparing this expression to the kinematical 
singularity axiom and using the definition of @(@) together with the properties of 
the scattering matrix we can see that they completely agree. The formula (8) can be 
used to define connected form factors recursively by subtracting the singular terms 
and taking the diagonal limit. Observe also that taking all € to be the same makes 
the lhs. of (8) the symmetric form factor, which is expressed by (8) in terms of the 
connected ones. 
In particular, for the 2-particle form factor we have only three graphs: 


O 0 Sees © O=—O 
1 3 I 2 1 2 


which give 


= = €] €2 

F4(0, + €1, 02 + €2, 02, 01) = Fy (A1, 02) + Pag + e212 (On) + O(€;) 
(10) 

This equation on the one hand can be used to define Fy (61, 62), once Fy (0) has 


been already defined, and on the other hand, it connects the symmetric form factor 
to the connected one: 


F3 (01, 02) = Fy (01, 92) + b12 F5 (01) + 21 F5 (62) (11) 


2.2. Finite Volume Form Factors in the BY Domain 


In the BY domain we drop the exponentially suppressed O(e~’"“) terms and keep 
only the O(L~') polynomial volume dependence. The quantization of the momenta 
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is given by the BY equations 


Oj = p(6j)L—i )~ log S(O; — %) = 2x1; (12) 
kikAj 


An n-particle state is labeled by the integers J;, which can be traded for the 
momenta: |Jj,..., In) = |01,..-.,9n)L. These states are normalized to Kronecker 
delta functions (I’|/) =[]1; oy Hj: Since two point functions in finite and infinite 


volume are equal up to exponentially small O(e~’"“) terms, the finite and infinite 
volume form factors differ only in the normalization of states [15]. In particular, this 
implies the non-diagonal finite volume form factor formula 


| ae (ee a: Serre, 1) 
Vv Pn Pin 


where the densities of states are defined through the Bethe Ansatz equation via 


(0,,.0.,0, |OlOn..-., 01), = +0O(e7™) (13) 


; a = dQ; 
Pn = det|Qj;j| ; Oi; = = jQ; = = (14) 
The conjectured formula for diagonal form factors takes the form [18]: 
- Fo 
(615. ++ 8nl O18 ps ++, Or)n = SRE ABPE 4 9 (ermby (15) 
Pn 
where the index set / = {1,..., 7} is split in all possible ways J = a Ua, Fo = 
F 5, (6a » .-+,Oq,) With |a| = k and pg is the shorthand for pp (6a,, --- , Pa n_4)> 


which denotes the sub-determinant of the matrix, Q;;, with indices only from @. 
There is an analogous expression in terms of the symmetric form factors [16] 


_ F508 
(15 Onl On oo -Oi) = SEABPE 4 O(em!) (16) 
n 


where now 5, is the density of states corresponding to the variables with labels in a. 
The equivalence of the two formulas was shown in [16]. Let us note that for L = 0 
the sum reduces to one single term )° 4 F305 — F; as all other p* factor vanish. 

Let us spell out the details for two particles. The diagonal finite volume form 
factor up to exponential correction is 


_ Fy@, 62) + p1(01) F5 (82) + p1 (62) Fs (01) + 2(01, 2) Fo 


(01, 02|@|02, 1) 1 
p2(81, 82) 


(17) 
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where 
EiL+¢12 12 
01, 02) = : 6;) = EjL + ¢)3-i 
(2(1, 82) Soy, Bedok oes PiGi) = EL + i3-i 
where E; = 0; p(6;). The analogous formula with the symmetric evaluation reads 
as 


Fy (01, 62) + p} (01) Fy (02) + 0} (62) Fy (A1) + 05 (1, 62) Fo 
(01, 62|C'|O2, OL = SHANE AY Aedes TUNE EAS DENCE ATMEL ED NET VED NE 2 
p2(61, 62) 


(18) 
where 


P7(61, 62) = 02(01, 02); P| (6) = E;L 


3 The Proof for Diagonal Large Volume Form Factors 


The idea of the proof follows from the large 0 behaviour of the scattering matrix, 
namely S(@) — 1, for@ — oo. This also lies behind the cluster property of the form 
factors. Thus by taking the non-diagonal form factor (0, 65, Bese O\E|On, ey OLE 
and sending 6 — ox, the extra particle decouples and we can approach the diagonal 
form factor. This can be achieved by choosing the same quantization numbers for 
both the 6; and 0; particles: 


Q' = pO, )L -i y log SO; — 6.) —ilog Sj — 6) =2aI; (19) 
kik#j 
is 
I 
reduce Q;. This means that in the limit considered 6; — 6; as €; — 0.In principle, 
€; depends on {6;} and on the way how @ goes to infinity. 
For finite 0, the form factor is non-diagonal and we can use 


Indeed, by sending (the quantization number of) 6 to infinity the BY equations, Q 


_ Pop ilO: 01s 3405 Ons ney OL) 


V Py 41 Pn 


The numerator is a finite quantity for any @ and has a finite 6 — oo limit accord- 
ingly. We can see in the limit that Py, 6;,.--,6%) goes to 1 (9) n(O1,.--,On)- 


(0,0), -.-,0,|A1On,...,O1)z +0(e"") 


(20) 


Similarly, for the form factors Fon+1 (0, 6, Lees 6’, Oye es 0) the cluster property 
guaranties the factorization F2, (01, ..+,07,0n,...,01) Fi (8), where additionally 
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6/ —> 6;. Actually the expression depends in the direction we take the limit in which 
all €; go to zero and our main task is to calculate this limit explicitly. Fortunately, 
the direction is dictated by the difference of the BY equations: 


QO’, — Qj = EjLej+ D> Ojle;-a) - 8; = > Ojpee— 8; =0 21) 


kikéj k 


where we have used the notations 6; = i log S(6; — @). 

Clearly 5;s are small and so are the €;s. In the following we analyze the € and 
6 dependence of the form factor Fon+1(6, 61, ...,0/,0n,..., 01). Similarly to the 
diagonal limit of form factors we can describe the 5 and € dependence by graphs. 
We claim that 


Fon+1 (0, 6}, -- +f, Oks +++, 01) = » F (graph) + O(€i, 8) 
allowed graphs,colorings 


(22) 


where, additionally to the previous graphs in (8), we should allow the coloring of 
those vertices, which do not have any outgoing edge, i.e. they can be either black 


or white. For each black dot with label i we associate a factor a Note that in the 


8 — oo limit we will have an overall F\(@) factor, which we factor out. 
Let us see how it works for n = 1: The single dot can be either black or white: 


O ] 
l 1 


thus the two contributions are 
aA pl a\—1 51 Cc 
F3(6,0,,0) Fi) = a ee (23) 


where ellipsis represents terms vanishing in the 5,¢€ — 0 limit. Let us show that 
F; (61) is not singular, i.e. the singularity of the lhs. is exactly SL The kinematical 
residue equation tells us that 


F3(, 0,01) = =a — S(6; -@ +ix))F\(@) +O) = “6+ O(1) (24) 


148 Z. Bajnok and C. Wu 


Thus, once the singularity is subtracted, we can safely take the €; — O and the 
5 — 0 limits leading to 


_ b= - 
pune: 91, 41) — an (0)) = Fy (1) Fi) (25) 


where we used the cluster property of form factors and the fact that the two particle 
diagonal connected form factor is non-singular. 

Now we adapt the proof in the induction step in (8) by noticing that the e 
singularity can come either from terms with only one outgoing edge or from being 
black. Thus the residue is 


1 = = 
e (Sn+€1 Pin a €n—1Pn—in) Fon_-1 (0, al FE gcc 4 On—1 + €n-1, On—1, ) 01) 
n 

(26) 


Let us calculate the analogous term from the kinematical residue axiom: 


= - SO, —0,-1)...S@, -@ 1 
Fon, 94, «05h, Ons «+25 1) > © (1 Stat x 


En S(O/ — Of _1)---S(Gn — 9) S(6, — 8) 
[me | Cen eee ome ee (27) 
The bracket can be expanded as 
Q = —16n + ban—1€n-1 + +++ + Oni €1) (28) 


which completes the induction. 
In particular, for two particles we have the following diagrams: 


O Oo O ©®@ @® Oo eo @ 
1 2 1 2 1 2 1 2 


OO) O-—>+® O=—O eo 
1 2 1 2 1 2 1 2 


which lead to the formula 


See ten a €2 62 €] 64 oT 62 
F5(6, 0), 05, 02,01) F, | = F£(01, 02) + —¢21 — + —on— + —— (29) 
€] a) €2 €] €( €2 


€] 52 €2 31 
+—12F5 (01) + = Fy (01) + 21 F5 (02) + — F5 (02) 
(>) €2 €] €] 
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It is interesting to check the coefficient of Fy (61) : 


€112 + 52 
€2 


= ExL + $21 = pi (62) (30) 


where we used the BY equations. Similarly 


6 62 El 61 €2 52 
—— + —¢12— + —d21— = 22(91, 92) (31) 
€1 €2 €2 €] €] €2 


which leads to the sought for formula for n = 2: 


F5(6,6;, 05, 02,01) F, | = F§ (01, 02) + 01 (02) FS (1) + pi (01) FS (2) + 201, 92) 
(32) 


In the following we prove the form of the diagonal form factors in the general 
case by induction. First we notice that once we use the BY equations to express 4; 
in terms of €; then all denominators of es disappear. Focus on €, ' and observe that 


bn + €1bin + +++ + €n—1bn—in = €n (EnL + bn—-in + +++ + in) (33) 


This implies that the diagonal finite volume form factor is a polynomial in L and 
linear in each E; L. We first check the L = 0 piece and then calculate the derivative 
wit. EL as the full expression is symmetric in all variables. Note that the naively 
singular term in €, at L = 0 takes the form: 


1 1 
a (EnL + bn—-in +++: + Gin) |L=0 = 2 (€bn—in +++ + €Gin) (34) 
n 


which is exactly the same we would obtain if we had calculated the diagonal limit of 
the form factor in the symmetric evaluation, i.e. for L = 0 we obtain the symmetric 
n-particle form factor. We now check the linear term in E,L. In doing so we 
differentiate the expression (22) wrt. E, L: 


On, b Fanga 0, Oy 0205 Og: Onss00s 01) = FanarO, 0) 50026 O93 Onaty 005 OD 
(35) 


since the term E,,L can come only through the singularity at €¢, = 0. Note that on 
the rhs. 6, satisfies the original BY equations and not the one where 6, is missing. 
Let us now take a look at the expression we would like to prove: 


iG Os Gee, = +, eS >, Fa (36) 


ava=I ava=I 
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where J = {1,...,}. Clearly the rhs. is also a polynomial in L, which is linear in 
each EL. To finish the proof, we note that the L = 0 constant part of the rhs. is 
the symmetric form factor. Using that df, 70% = Pa\{n} ifn € a and 0 otherwise we 
can see that 


ine Fee So peg Fn 10. Oca Mahe OF 
alo=I BUB=I\{n} 
(37) 


by the induction hypothesis, which completes the proof. 


4 Conclusion 


In this paper we proved the large volume expression for the diagonal form factors 
by taking carefully the limit of a nondiagonal form factor. Our result completes the 
proof of the LM formula, which describes exactly the one-point function in finite 
volume. 

Diagonal finite volume form factors are relevant in the AdS/CFT correspondence 
as they are conjectured to describe the Heavy-Heavy-Light (HHL) type three point 
functions of the maximally supersymmetric 4D gauge theory [3]. This conjecture 
was first proved at weak coupling [6] then at strong coupling [2], finally for all 
couplings in [7, 8]. We have profited from all of these proofs and used them in the 
present paper. 

There is a natural extension of our results for diagonal form factors in non- 
diagonal theories. Clearly the same idea of adding one more particle and sending 
its rapidity to infinity can be applied there too and we have an ongoing research into 
this direction. 
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Narayana Number, Chebyshev ®) 
Polynomial and Motzkin Path on RNA et 
Abstract Shapes 


Sang Kwan Choi, Chaiho Rim, and Hwajin Um 


Abstract We consider a certain abstract of RNA secondary structures, which is 
closely related to so-called RNA shapes. The generating function counting the 
number of the abstract structures is obtained in three different ways, namely, by 
means of Narayana numbers, Chebyshev polynomials and Motzkin paths. We show 
that a combinatorial interpretation on 2-Motzkin paths explains a relation between 
Motzkin paths and RNA shapes and also provides an identity related to Narayana 
numbers and Motzkin polynomial coefficients. 


1 Introduction 


Ribonucleic acid (RNA) is a single stranded molecule with a backbone of 
nucleotides, each of which has one of the four bases, adenine (A), cytosine (C), 
guanine (G) and uracil (U). Base pairs are formed intra-molecularly between A-U, 
G-C or G-U, leading the sequence of bases to form helical regions. The primary 
structure of a RNA is merely the sequence of bases and its three-dimensional 
conformation by base pairs is called the tertiary structure. As an intermediate 
structure between the primary and the tertiary, the secondary structure is a planar 
structure allowing only nested base pairs. This is easy to see in its diagrammatic 
representation, see Fig.1. A sequence of n bases is that of labeled vertices 
(1,2,---,) in a horizontal line and base pairs are drawn as arcs in the upper 
half-plane. The condition of nested base pairs means non-crossing arcs: for two 
arcs (i,j) and (k,l) wherei < j,k < 1 andi < k, eitheri < j < k < lor 


S. K. Choi 

Center for Theoretical Physics, College of Physical Science and Technology, Sichuan University, 
Chengdu, China 

e-mail: hermit1231 @sogang.ac.kr 


C. Rim (4) -H. Um 
Department of Physics, Sogang University, Seoul, South Korea 
e-mail: rimpine @sogang.ac.kr; um16@sogang.ac.kr 


© Springer Nature Switzerland AG 2019 153 
D. R. Wood et al. (eds.), 2017 MATRIX Annals, MATRIX Book Series 2, 
https://doi.org/10.1007/978-3-030-04161-8_11 


154 S. K. Choi et al. 


(== 


COCC Cer ed Jee Cee Jay) )) es 


Fig. 1 Representations of secondary structures. The RNA structure on the left hand side is 
represented as the diagram (top right) and the dot-bracket string (bottom right) 


i < k <1 < i. Since the functional role of a RNA depends mainly on its 3D 
conformation, prediction of RNA folding from the primary structure has long 
been an important problem in molecular biology. The most common approach 
for the prediction is free energy minimization and many algorithms to compute 
the structures with minimum free energy has been developed (see for instance, 
[13, 17, 21, 22). 

On the other hand, RNA structures are often considered as combinatorial objects 
in terms of representations such as strings over finite alphabets, linear trees or the 
diagrams. Combinatorial approaches enumerate the number of possible structures 
under various kinds of constraints and observe its statistics to compare with 
experimental findings [1, 4, 9, 16, 18]. They also provide classifications of structures 
to advance prediction algorithms [8, 14, 15, 20]. 

In this paper, we consider a certain abstract of secondary structures under a 
pure combinatorial point of view regardless of primary structures. The abstract 
structure is, in fact, closely related to so-called RNA shapes [8, 10, 12], see Sect. 3. 
Although we will consider it apart from prediction algorithms, let us review briefly 
the background to RNA shapes in the context of prediction problem. In free 
energy minimization scheme, the lowest free energy structures are not necessarily 
native structures. One needs to search suboptimal foldings in a certain energy 
bandwidth and, in general, obtains a huge set of suboptimal foldings. RNA shapes 
classify the foldings according to their structural similarities and provide so-called 
shape representatives such that native structures can be found among those shape 
representatives. Consequently, it can greatly narrow down the huge set of suboptimal 
foldings to probe in order to find native structures. 

In the following preliminary, we introduce our combinatorial object, what 
we call island diagrams and present basic definitions needed to describe the 
diagrams. In Sect. 2, we find the generating function counting the number of island 
diagrams in three different ways and through which, one may see the intertwining 
relations between Narayana numbers, Chebyshev polynomials and Motzkin paths. 
In particular, we find a combinatorial identity, see Eq. (15), which reproduces the 
following two identities that Coker provided [5] (see also [3] for a combinatorial 
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interpretation): 
Vin n a! n—-1 
k-1 7 k n—2k-1 
= = C. 1 1 
Dalalle i= 2 ol eee w 
k=1 k=0 
ee n = n—-1 
as Akl (4 2(n—k) _ C = k-1¢q k-1 
Ne (1+ x) » (fo) (+x) 


(2) 


where C; is the Catalan number defined by C, = m4) for k > 0. We also pro- 
vide a combinatorial interpretation on 2-Motzkin paths to explain the identity (15). 


The interpretation implies the bijection between z-shapes and Motzkin paths which 
was shown in [7, 11]. 


1.1 Preliminary 


A formal definition of secondary structures is given as follows: 


Definition 1 (Waterman [20]) A secondary structure is a vertex-labeled graph on 


n vertices with an adjacency matrix A = (a;;) (whose element aj; = 1 if i and 
j are adjacent, and a;; = 0 otherwise with a;; = 0) fulfilling the following three 
conditions: 


1. aji41 = 1 forl <i<n-1. 
2. For each fixed i, there is at most one a;; = 1 where j Ai +1 
3. If aj; = agi = 1, where i < k < j,theni </ < j. 


An edge (i, j) with |i — j| 4 1 is said to be a base pair and a vertex i connected only 
toi — 1 andi + 1 is called unpaired. We will call an edge (7,i + 1), 1 <i<n-1, 
a backbone edge. Note that a base pair between adjacent two vertices is not allowed 
by definition and the second condition implies non-existence of base triples. 

There are many other representations of secondary structures than the dia- 
grammatic representation. In this paper, we often use the so-called dot-bracket 
representation, see Fig. 1. A secondary structure can be represented as a string S 
over the alphabet set {(, ), .} by the following rules [9]: 


669 


1. If vertex i is unpaired then S; 
2. If (i, j) is a base pair andi < j then S; = “(’ andS; =“)”. 


In the following, we present the basic definitions of structure elements needed 
for our investigations. 
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interior loop 


stack hairpin island bulge tale 


Fig. 2 Structure elements of secondary structures 


Definition 2 A secondary structure on (1,2,---,m) consists of the following 
structure elements (cf. Fig. 2). By a base pair (i, 7), we always assumei < j. 


1. The sequence of unpaired vertices (i + 1,i+2,---,j — 1) is a hairpin if (i, j) 
is a base pair. The pair (i, 7) is said to be the foundation of the hairpin. 

2. The sequence of unpaired vertices (i + 1,i+2,--- , 7 — 1) is a bulge if either 
(k, j), (K+ 1,7) or G,k + 1), G, k) are base pairs. 

3. The sequence of unpaired vertices (i+ 1,i+2,--- , j — 1) is ajoinif (k, i) and 
(j, J) are base pairs. 

4. A tail is a sequence of unpaired vertices (1,2,--- ,i — 1), resp. (f + 1,7 + 
2,--+,m) such that 7, resp. j is paired. 

5. An interior loop is two sequences of unpaired vertices ((+1,i+2,---, j—1) and 
(k+1,k+2,--- ,/—1) such that (i, /) and (j, k) are pairs, wherei < j <k <l. 

6. For any k > 3 and O < I,m < k with! +m = k, a multi loop is 1 


sequences of unpaired vertices and m empty sequences (i; +1,--- , j1—1), Gat 
1, ao ,J2 2 1), eae} (ix + 1,- ee » Jk ~ 1) such that (1, Jk), (j1, i2), ees Gik-1; 
ix) are base pairs. Here, a sequence (i + 1,--- , 7 — 1) is an empty sequence if 
i+l=j. 


7. A stack (or stem) consists of uninterrupted base pairs (i + 1, 7 — 1), @+2,j -— 
2),--+ ,(@ +k, j —k) such that neither (i, 7) nor +k +1, 7 —k— 1) isa base 
pair. Here the length of the stack is k. 


Note that, while other structure elements consist of at least one vertex, a multiloop 
does not necessarily have a vertex. In the diagrammatic representation, a multiloop 
is a structure bounded by three or more base pairs and backbone edges. 


Definition 3. An island is a sequence of paired vertices (i, i+1,--- , j) such that 


1. i — 1 and j + 1 are both unpaired, where 1 <i <j <n. 
2. j + lis unpaired, where i = 1 andl <j <n. 
3. i — 1 is unpaired, where | <i <n andj =n. 


Now we introduce the abstract structures to consider throughout this paper. From 
here on, we will call the structures island diagrams for convenience. An island 
diagram (cf. Fig. 3) is obtained from secondary structures by 
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(AVN 


Fig. 3 An example of island diagrams. This island diagram is the abstract structure of the 
secondary structure given in Fig. 2 


1. Removing tails. 
2. Representing a sequence of consecutive unpaired vertices between two islands 
by a single blank. 


Accordingly, we retain unpaired regions except for tails but do not account for the 
number of unpaired vertices. In terms of the dot-bracket representation, we shall 
use the underscore “_” for the blank: for example, the island diagram “((_)_)” 
abstracts the secondary structure “((...)....)”. Since the abstraction preserves all 
the structure elements (except for tails) in definition 7, we will use them to describe 
island diagrams in such a way that, for instance, the blank is a hairpin if its left and 
right vertices are paired to each other. 


2 Generating Function 


We enumerate the number of island diagrams g(h, I, ¢), filtered by the number of 
hairpins(h), islands(/) and basepairs(¢). Let G(x, y,z) =, ph, 1, Ox" y! 2! 
denotes the corresponding generating function. We obtain the generating function in 
three different ways, by means of Narayana numbers, Chebyshev polynomials and 
Motzkin paths. In particular, we provide a bijection map between 2-Motzkin paths 
and sequences of matching brackets. 


2.1 Narayana Number 


The easiest way to obtain the generating function G(x, y, z) is to use a combinato- 
rial interpretation of the Narayana numbers, which are defined by 


Cn ee Ga | ee ee (3) 
ee) =i 


The Narayana number N(n, k) counts the number of ways arranging n pairs of 
brackets to be correctly matched and contain k pairs as “()”. For instance, the bracket 
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representations for V(4, 2) = 6 are given as follows: 


COCO) COO) MOO) OMO) (OIC) (ODO 


It is easy to recover island diagrams from this representation. 


Proposition 1 The generating function has the form 


G(x, y,2) = aN h) xt yt! (1 4. yy2erlt 2 (4) 


Its closed form is 


_ y 1— A+ B)—/1-—2A(1+4+ B) + A2(1 — BP? 
GQ, y,2) = (=) a— or 


where A = z(1+ y)? and B= xy/(1+ y). 


Proof One may immediately associate bracket representations of the Narayana 
numbers with island diagrams. Without regard to underscores, the pair of brackets is 
associated with the basepair and the sub-pattern “()” core pends to the foundation 
of the hairpin. It clearly explains the factor N(,h)x!"z*. Now we consider the 
insertions of underscores to recover the string representation of island diagrams. 
Recall that, in secondary structures, a hairpin consists of at least one unpaired 
vertices. Therefore, the foundation of the hairpin “()” must contain a underscore 
“(_)”. The number / of underscores are so inserted that we have the factor yer, 
After the insertion of hairpin underscores, there are (2€ — 1 — h) places left to 
possibly insert underscores. The numbers of all possible insertions are summarized 
by the factor (1 + y)?4-!-". The generating function of the Narayana numbers is 
well-known (see for instance [2]) so that one writes the closed form. 


2.2 Chebyshev Polynomial 


One can also count the number of island diagrams by using the Chebyshev 
polynomials of the second kind, which are defined by the recurrence relation: 


Uo) =1, Ui) =2E, Unsi(€) = 2€Un(E) — Un-1 €). (6) 


The product of the polynomials expands as 


n 


Um E)Un(E) = Ss Um—n+2K(§) for n<m. (7) 


k=0 
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The relation between island diagrams and Chebyshev polynomials are based on the 
Feynman diagram of the Hermitian matrix model, refer to [4]. One may have an 
insight from the simplest example: 


U, x UW, = U; + U, + U, 


i: ak Jee: 4. £0): 


The polynomial U; corresponds to the island with k vertices. The product U2 U2 
expands to U4(no basepair), U2(one basepair) and Uo(all vertices are paired). The 
island diagram is the one associated with Up in the expansion of the product. In 
general, we have the following theorem. See [4] for its proof. 


Theorem 1 Suppose that there exist the number I of islands such that each of which 
has kg => 1 vertices fora € {1,--- , I}. The number of island diagrams one finds by 
making base pairs is given by 


£ 2 1 1 
(TI Ua. Uo) = = fT] eevee) Lede (8) 
a=1 ~*a=1 


where U;,(E) is the second kind Chebyshev polynomial of degree k. 


The Chebyshev polynomials of the second kind are orthogonal with respect to the 
weight /1 — &2: (Um, Un) = 6m.n. Thus, Theorem | means that the number of 
island diagrams is the coefficient of Up = 1 when the product Ti 1 Ux, (€) expands 
to the linear combination of Chebyshev polynomials. 

In order to reproduce the generating function given in (4), we need to take 
the number of hairpins into account as well. Let us first consider the case of 
island diagrams in which every blank(underscore) is a hairpin. A hairpin is 
accompanied with the foundation of the hairpin, that is, ) basepairs are assigned 
as the foundations. Since those basepairs are the most nested ones, the number 


of the island diagrams is simply given by (Ui, 1 Tj Uk; 2 Ukys\-15 Uo). The 
foundations of the hairpin take one vertex from the outermost islands and take two 
vertices from the others. In fact, the island diagrams having only hairpins are no 
different from strings of matching brackets which represents Narayana numbers as 
shown in the previous subsection. By just putting (_) > (), we recover the bracket 
representations. Thus, we have the following corollary: 


Corollary 1.1 Forany€¢ Nand1<h< 8, 


h+1 
N(€,h) = > (TI Uke» us) (9) 


kite t+kpyi=2(€-h) ‘a= 


where kg fora € {1,--- ,h + 1} are non-negative integers. 
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Now we find the generating function G(x, y, z). Note that a basepair must be 
made across at least one hairpin. Conversely, no basepair can be made amongst 
consecutive islands that do not have a hairpin inbetween. We regard a group of 
maximally consecutive islands with no hairpin inbetween as one effective island. 
Then, a backbone of island diagram can be seen as an alternate arrangement of 
effective island and hairpin. This is nothing but the case that every blank is a hairpin. 
One additional thing to consider is the number of ways to make an effective island 
having kg vertices out of J, islands, which is given by (i). Therefore, we find 


i=1 
h+1 ky — h 

gh, Le)= >> [] (i. 1 (ee - |] Gee i, uo) (10) 
tkasta} a=1 j= 


where the summation runs over kj + --- + ky4, = 2€ and) +---+ Iy4, =I. By 
means of Corollary 1.1, one can obtain the generating function (4). 

We mention that one may also find the generating function by direct calculation 
of the integral in (10). Using the generating function of the Chebyshev polynomial, 


k 
1 
zk/2yi ees ee 
PI 1S) = Ta Rd ye ad ye ve 


the integral is calculated to give 


Gx, yc) = Doxey d+ yo Fiat 1h; 2; 2 + y)’) (12) 


where 2 F\(a, b; c; z) is the hypergeometric function. One may easily show that 
2F\(h + 1,h;2;z) = Leu N(h +k, h)zk and therefore obtains the generating 
function (4). 


2.3 Motzkin Path 


The generating function G(x, y,z) can also be written in terms of Motzkin 
polynomial coefficients. The Motzkin numbers M,, and the Motzkin polynomial 
coefficients M(n, k) are defined as 


Ln/2I 
= > M(n,k) where M(n, k) = (acs (13) 


k=0 
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Let us consider the combinatorial identity in the following theorem. It is easy to 
prove using the generating function of the Motzkin polynomials: 


a inp 1-A-V0—Ay?—4A2B 
Yo So M(e=1, p)AT! BP = —— (14) 


2A2B 
f>1  p=0 
Theorem 2 For any integer ¢ > 1, there holds 
y aa 
—_— ¥ nGaasy tits" 
ray (é,h) (x yy" (1+ y) 
7! l—-2p-1 
=xy’ D> ME-1,p)(xytyy)? (A+ yd +ytxy)?. 
p=0 
(15) 


Proof The left hand side is [z°]G(x, y, z) given in (4). Multiplying z° and taking 
the summation over £ at each side, one can check that the right hand side is indeed 
the generating function G(x, y, z). 


Note that the identity (15) reproduces the Coker’s two identities. When we substitute 
x/y for x and then put y = 0, we get the identity (1). Furthermore, the substitution 
x — y/(1 + y) leads to the identity (2). 

We will investigate how the right hand side in (15) represents island diagrams. 
In order to do that, we need a combinatorial interpretation of 2-Motkzin paths. 
Let us first introduce the Motzkin paths, that can also be called 1-Motkzin paths. 
A Motzkin path of size n is a lattice path starting at (0, 0) and ending at (n, 0) in 
the integer plane Z x Z, which satisfies two conditions: (1) It never passes below 
the x-axis. (2) Its allowed steps are the up step (1, 1), the down step (1, —1) and 
the horizontal step (1, 0). We denote by U, D and H an up step, a down step and 
a horizontal step, respectively. The Motzkin polynomial coefficient M(n, k) is the 
number of Motzkin paths of size n with k up steps. Since the Motkzin number M;, 
is given by the sum of M(n, k) over the number of up steps, M, is the number of 
Motzkin paths of size n. See for instance, the following figure depicting a Motzkin 
path of M(7, 2): 


'In order to deduce the identity, one may need the Touchard’s identity [19]: C, = 
Ck a aa which can be also derived from (1) when x = 1. 
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On the other hand, 2-Motzkin paths allow two kinds of horizontal steps, which 
often distinguish one from another by a color, let us say, R and B denoting a red and 
a blue step, respectively. We provide a bijection map between 2-Motzkin paths and 
strings of matching brackets.” Suppose we have a 2-Motzkin path of size n given by 
a string gj g2 --- gn over the set {U, D, R, B}. The corresponding string of brackets 
Sn can be obtained by the following rules: 


1. We begin with “()” : Let So = O. 

2. For any 1 < k < n, suppose there exist a string of brackets S’ and a string 
of matching brackets S” which are possibly empty such that S,_; has the form 
S’(S”). Then S; is given by 


S'(S")0 if qx =U, S'(S")) if qx = D, 
S'(S"\0 ifae=R, S'S") if qe = B- 


For example, the string of matching brackets corresponding to the 2-Motzkin 
path UBURDD is obtained as follows: 


0-4 00 (OW) & («O)0 
, (0((0)00 > (OO)00) > (0/000) 


We remark here that only blue steps can make a stack. In other words, directly 
nested structures such as “(())” never occur without blue steps. Therefore, a 1- 
Motzkin path can be translated into a string of matching brackets without directly 
nested brackets. This is one of the 14 interpretations of Motzkin numbers provided 
by Donaghey and Shapiro in [7]. Later, in [11], it was also shown using context- 
free grammars in the context of RNA shapes. We also remark that the Motzkin 
polynomial coefficient M(€ — 1, uv) is the number of ways arranging ¢ pairs of 
brackets to be correctly matched and contain ¢ — u pairs as “()” with no occurrence 
of directly nested bracket. 

Now we go back to the generating function on the right hand side in (15) and 
rewrite it as 


Gx, y,2) =) Me — 1, u) (xy?z) ((1 + yz)" (xy + y)z)" 
lu (16) 


x (+ yz)" (1+ yz + xy + y)z)° 


Sequences of matching brackets are only Dyck paths. A bijection map between Dyck paths and 
2-Motzkin paths was introduced by Delest and Viennot [6]. But here we present a different way of 
mapping than the well-known one. 
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where u, d and s stand for the number of up, down and horizontal steps, respectively 
(u=d,u+d+s = ¢-— 1). Let us explain each factor in detail by means of 
the above rules. The term xyz is merely the starting hairpin “(_)” (recall that the 
exponent of x, y and z are the number of hairpins, islands and basepairs, resp.). 
At each up step, one has a left bracket and a hairpin to add. For a given non- 
empty string S of island diagrams, suppose that we add a left bracket then there are 
the two possibilities, “(S” and “(_S” corresponding to ,/z and y./z, respectively. 
Thus, we get the factor (1 + y)./z at every up step and, in the same manner, at 
every down step. Likewise, adding a hairpin introduces the factor xy(1 + y)z since 
“§(_)” and “S_(_)” corresponds to xyz and xyz, respectively. On the other hand, a 
horizontal step can be either R or B. A red step is to add a hairpin and corresponds to 
xy(1+ y)z. A blue step is to add one basepair nesting the string “(.S)” and there are 
three possibilities: the stack “((S$))” for z, the two bulges “(_(S))” and “((S)_)” for 
yz and the interior loop “(_(S)_)” for yz. Therefore, we get (A+y)*z+xy(1+y)z) 
at each horizontal step. 

Note that the number of up steps is the number of multiloops since every up step 
opens a new multiloop. Thus the generating function written in terms of Motzkin 
polynomials can be said to classify island diagrams by the number of basepairs 
and multiloops while the one written in terms of Narayana numbers classify island 
diagrams by the number of basepairs and hairpins. 


3 Single-Stack Diagrams and RNA Shapes 


An island diagram is called a single-stack diagram if the length of each stack in 
the diagram is | so that each basepair is a stem by itself. Let s(h, J, k) denotes the 
number of single-stack diagrams classified by the number of hairpins(/), islands(/) 
and stems(k) and let S(x, y,z) = aL k s(h, I, kx! yt zk denotes its generating 
function. The island diagrams with k stems and ¢ basepairs build on the single-stack 
diagrams with k stems. The number of ways stacking € — k basepairs on k stems is 
Cs) and we have 


£ 


é-1 
g(h, I, 0) = ae i) k). (17) 


k=1 


Multiplying x” y/z° at each side and summing over h, /, £, one finds the relation 


G(x, y,z) = S(x,y, ) (18) 
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and equivalently, S(x, y, z) = G(x, y, z/(1+2z)). In terms of Motzkin polynomials, 
the generating function S(x, y, z) expands to 


S(x,y, 2) =) M(€- 1, u) (xy*z) (+ yz)" (zy + y)2)" 
ku (19) 


x (+ y)V2Z)* (Qy + yz + xy + y)z)" 


where wu, d and s stand for the number of up, down and horizontal steps, respectively 
(u = d,u+d+s = k — 1). This is the same as (16) except for one thing. 
Recall that only blue steps make a directly nested bracket and from which we get 
three possibilities by putting underscores, i.e., a stack for z, two bulges for yz and 
an interior loop for y~z. One obtains single-stack diagrams by getting rid of the 
possibility of stacking and hence the one different thing is the factor z such that one 
has (2y + y)z instead of (1 + y)?z. 

We mention that the single-stack diagram is closely related to the ’-shape (or 
type 1), which is one of the five RNA abstract shapes provided in [8] classifying sec- 
ondary structures according to their structural similarities. z’-shape is an abstraction 
of secondary structures preserving their loop configurations and unpaired regions. 
A stem is represented as one basepair and a sequence of maximally consecutive 
unpaired vertices is considered as an unpaired region regardless of the number of 
unpaired vertices in it. In terms of the dot-bracket representation, a length k stem 
«k...)k” is represented by a pair of squared brackets “[---]” and an unpaired 
region is depicted by an underscore. For instance, the z’-shape “ _[[[_]_[_]]_]” can 
abstract from the secondary structure“... ((((...)..((...))))..)”. The only difference 
between single-stack diagrams and zr’-shapes is whether or not to retain tales. 

On the other hand, z-shape (or type 5) ignores unpaired regions such that, 
for example, the z’-shape “_[[[_]_[_]]_]” results in the z-shape “[[][]]”. Con- 
sequently, m-shapes retain only hairpin and multiloop configurations. One may 
immediately notice that the string representations of z-shapes are nothing but 
the sequences of matching brackets without directly nested brackets. Therefore, 
as was shown in the previous section, there is a bijection map between z-shapes 
and 1-Motzkin paths. Accordingly, one finds the theorem 3.1 in [11] that the 
number of z-shapes with @ pairs of squared brackets is the Motzkin number Me_. 
Furthermore, the Motzkin polynomial coefficient M(¢ — 1, uw) is the number of zr- 
shapes with u multiloops and ¢ — u hairpins. 


Acknowledgements The authors acknowledge the support of this work by the National 
Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (NRF- 
2017R1A2A2A05001164). S.K. Choi is partially supported by the National Natural Science 
Foundation of China under grant 11575119. 


Narayana Number, Chebyshev Polynomial and Motzkin Path on RNA Abstract Shapes 165 


References 


1, 


2. 


3. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


Barrett, C.L., Li, T.J., Reidys, C.M.: RNA secondary structures having a compatible sequence 
of certain nucleotide ratios. J. Comput. Biol. 23, 857-873 (2016) 

Barry, P., Hennessy, A.: A note on Narayana triangles and related polynomials, Riordan arrays, 
and MIMO capacity calculations. J. Integer Seq. 14(3), 1-26 (2011) 

Chen, W.Y., Yan, S.H., Yang, L.L.: Identities from weighted Motzkin paths. Adv. Appl. Math. 
41(3), 329-334 (2008). https://doi.org/10.1016/j.aam.2004.11.007. http://www.sciencedirect. 
com/science/article/pii/S0196885808000158 


. Choi, S.K., Rim, C., Um, H.: RNA substructure as a random matrix ensemble. 


arXiv: 1612.07468 [q-bio.QM] 


. Coker, C.: Enumerating a class of lattice paths. Discrete Math. 271(1), 13-28 (2003). https:// 


doi.org/10.1016/S0012-365X(03)00037-2. http://www.sciencedirect.com/science/article/pii/ 
$0012365.X03000372 


. Delest, M.P., Viennot, G.: Algebraic languages and polyominoes enumeration. Theor. Comput. 


Sci. 34(1), 169-206 (1984). _ https://doi.org/10.1016/0304-3975(84)90116-6. http://www. 
sciencedirect.com/science/article/pii/0304397584901 166 


. Donaghey, R., Shapiro, L.W.: Motzkin numbers. J. Comb. Theory Ser. A 23(3), 291- 


301 (1977). https://doi.org/10.1016/0097-3165(77)90020-6. http://www.sciencedirect.com/ 
science/article/pii/00973 16577900206 


. Giegerich, R., Voss, B., Rehmsmeier, M.: Abstract shapes of RNA. Nucleic Acids Res. 32(16), 


4843-4851 (2004). https://doi.org/10.1093/nar/gkh779 


. Hofacker, I.L., Schuster, P., Stadler, P.F.: Combinatorics of RNA secondary structures. Discrete 


Appl. Math. 88(1-3), 207-237 (1998) 


. Janssen, S., Reeder, J., Giegerich, R.: Shape based indexing for faster search of RNA family 


databases. BMC Bioinformatics 9(1), 131 (2008). https://doi.org/10.1186/1471-2105-9-131 


. Lorenz, W.A., Ponty, Y., Clote, P.: Asymptotics of RNA shapes. J. Comput. Biol. 15(1), 31-63 


(2008) 


. Nebel, M.E., Scheid, A.: On quantitative effects of RNA shape abstraction. Theory Biosci. 


128(4), 211-225 (2009). https://doi.org/10.1007/s12064-009-0074-z 


. Nussinov, R., Jacobson, A.B.: Fast algorithm for predicting the secondary structure of single- 


stranded RNA. Proc. Natl. Acad. Sci. USA 77(11), 6309-13 (1980). https://doi.org/10.1073/ 
pnas.77.11.6309 

Orland, H., Zee, A.: RNA folding and large N matrix theory. Nucl. Phys. B620, 456-476 
(2002) 

Reidys, C.M., Huang, F.W.D., Andersen, J.E., Penner, R.C., Stadler, P.F., Nebel, M.E.: Topol- 
ogy and prediction of RNA pseudoknots. Bioinformatics 27(8), 1076-1085 (2011). https://doi. 
org/10.1093/bioinformatics/btr090. _https://academic.oup.com/bioinformatics/article-lookup/ 
doi/10.1093/bioinformatics/btr090 

Schmitt, W.R., Waterman, M.S.: Linear trees and RNA secondary structure. Discrete Appl. 
Math. 51(3), 317-323 (1994). https://doi.org/10.1016/0166-218X(92)00038-N. http://www. 
sciencedirect.com/science/article/pii/0166218X9200038N 

Schuster, P., Stadler, P.E, Renner, A.: RNA structures and folding: from conventional to 
new issues in structure predictions. Curr. Opin. Struct. Biol. 7(2), 229-235 (1997). https:// 
doi.org/10.1016/S0959-440X(97)80030-9. http://www.sciencedirect.com/science/article/pii/ 
$0959440X97800309 

Stein, P., Waterman, M.: On some new sequences generalizing the Catalan and Motzkin num- 
bers. Discrete Math. 26(3), 261-272 (1979). https://doi.org/10.1016/0012-365X(79)90033-5. 
http://www.sciencedirect.com/science/article/pii/0012365X79900335 

Touchard, J.: Sur certaines équations fonctionnelles. In: Proceedings of the International 
Congress on Mathematics, Toronto, 1924, vol. 1, pp. 465-472 (1928) 

Waterman, M.S.: Secondary structure of single-stranded nucleic acids. Ady. Math. Suppl. Stud. 
1, 167-212 (1978) 


166 S. K. Choi et al. 


21. Zuker, M., Sankoff, D.: RNA secondary structures and their prediction. Bull. Math. Biol. 46(4), 
591-621 (1984). https://doi.org/10.1016/S0092-8240(84)80062-2. http://www.sciencedirect. 
com/science/article/pii/S0092824084800622 

22. Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermody- 


namics and auxiliary information. Nucleic Acids Res. 9(1), 133-148 (1981). https://doi.org/ 
10.1093/nar/9.1.133 


A Curious Mapping Between M®) 
Supersymmetric Quantum Chains peels 


Gyorgy Z. Feher, Alexandr Garbali, Jan de Gier, and Kareljan Schoutens 


We dedicate this work to our friend and colleague Bernard 
Nienhuis, on the occasion of his 65-th birthday. 


Abstract We present a unitary transformation relating two apparently different 
supersymmetric lattice models in one dimension. The first (Fendley and Schoutens, 
J Stat Mech, P02017, 2007) describes semionic particles on a 1D ladder, with 
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1 Introduction 


The concept of supersymmetry was conceived in the realm of (high-energy) particle 
physics, where it expresses a fundamental symmetry between bosonic and fermionic 
(elementary) particles or excitations in a quantum field theory or string theory. It 
would seem that in the context of (low-energy) condensed matter systems a similar 
concept is out of place as superpartners to, say, the electron, if such exist at all, 
are far out of sight. Nevertheless, we have learned that supersymmetry can be a 
useful ingredient in relatively simple model systems describing a condensed phase 
of matter. As soon as the relevant degrees of freedom are not all bosonic, the notion 
of a fermionic symmetry becomes feasible. 

A particularly simple supersymmetric lattice model, commonly referred to as 
the M; model, was proposed in [1]. It features itinerant spin-less fermions on 
a lattice (or graph), with supersymmetry adding or taking out a single fermion. 
Denoting the supercharges as Q? and Q, the Hamiltonian of what is called.” = 2 
supersymmetric quantum mechanics [2] is defined as 


H = {Q", Q}. (1) 


In the M; model the non-trivial nature of H is induced by stipulating that fermions 
are forbidden to occupy nearest neighbour sites on the lattice. These simple 
definitions lead to surprisingly rich and diverse phenomena. On a 1D lattice, the My 
model was found to be critical, and described by the simplest unitary minimal model 
of .VY = 2 superconformal field theory [1]. On 2D lattices, there is the remarkable 
phenomenon of superfrustration: an extensive (in the area, that is to say the number 
of sites) entropy for zero-energy supersymmetric ground states [3-5]. 

Additional features of the M; model in 1D are integrability by Bethe Ansatz and 
the existence of a mapping to the X XZ model at anisotropy A = —1/2 [6]. These 
features were generalized to a class of models called M;, where up to k fermions 
are allowed on consecutive lattice sites [6]. At critical behaviour of these models 
is captured by the k-th minimal model of .” = 2 superconformal field theory, 
while massive deformations give rise to integrable massive .Y = 2 QFT’s with 
superpotentials taking the form of Chebyshev polynomials [7]. 

This paper is concerned with two other, and seemingly different, incarnations of 
supersymmetry in one spatial dimension. The first is a model, proposed by Fendley 
and Schoutens (FS) [7], where the supercharges Q and Qi move particles between 
two legs of a zig-zag ladder. This would suggest that the particles on the two legs be 
viewed as bosonic and fermionic, respectively, but the situation in the FS model is 
different: the phases between the (fermionic) supercharges and the particles are such 
that the particles on the two legs are naturally viewed as anyons with statistical angle 
+71 /2, that is, as semionic particles. Interestingly, pairs of semions on the two legs 
can form zero-energy ‘Cooper pairs’ and the model allows multiple supersymmetric 
groundstates that are entirely made up of such pairs. The FS model is integrable by 
Bethe Ansatz and has a close-to-free-fermion spectrum: all energies agree with those 
of free fermions on a single chain, but the degeneracies are different. 
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The second model we discuss was introduced by Feher, de Gier, Nienhuis and 
Ruzaczonek (FGNR) [8]. It can be viewed as a particle-hole symmetric version of 
the M; model, where the ‘exclusion’ constraint on the Hilbert space has been relaxed 
and where the supercharges are now symmetric between particles and holes. In this 
model fermion number conservation is violated as terms creating and annihilating 
pairs of fermions are included in the Hamiltonian. The FGNR can be conveniently 
described in terms of domain walls between particle and hole segments, as the 
number of such walls is conserved. Also this model allows a Bethe Ansatz solution 
and the spectrum of the periodic chain has been shown to have characteristic 
degeneracies. Just as in the FS model, the degeneracies in the FGNR model can 
be explained by the formation of zero-energy ‘Cooper pairs’. 

The sole purpose of this contribution to the 2017 MATRIX Annals is to establish 
a unitary transformation between the FS and FGNR models on an open chain. Based 
on the similarity of the Bethe ansatz solutions and that of the physical properties the 
existence of such a map is not too surprising. Nevertheless, the details are quite 
intricate. This holds in particular for the phase factors involved in the mapping, 
which achieve the task of transforming domain walls in the FGNR formulation to 
particles which, in the FS formulation, are best interpreted as semions. The non- 
local patterns of the phase factors can be compared to the ‘strings’ of minus signs 
featuring in the Jordan- Wigner transformation from spins to (spin-less) fermions. 


2 Models 


In this section, we define the models [8, 9]. We refer to the model of [9] as FS model, 
and the model of [8] as FGNR model. Both are spinless fermion models on a chain 
of length Z with some boundary conditions. The fermionic creation and annihilation 
operators c;, cl (i =1,..., L) satisfy the usual anticommutation relations 


{cej,cf}= 5, {ef e}} = fei, ej} = 0. (2) 


Based on the fermionic creation operators, the on site fermion-number and hole- 
number operators are defined as 


nj = ch ci Di = 1-—7;. (3) 
These operators act in a fermionic Fock space spanned by ket vectors of the form 
i ss 
it) =[] (ci) ° 19), (4) 
i=1 


where the product is ordered such that c! with higher i act first. The label t = 
{t1,..., TL}, with t; = 1 if there is a fermion at site i and t; = 0 if there is a hole. 
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The vacuum state is defined as usual c; |0) = 0 fori = 1,..., L. Both models are 
supersymmetric chain models where the nilpotent supercharges Q, Q* are built as 
sums of local operators. 

Originally, the FS model was considered with open boundary conditions [9] and 
the FGNR model with periodic boundary conditions [8]. In this section we give a 
short overview of [9] and [8]. In Sect. 3 we restrict ourselves to the open boundary 
conditions for both models with even L and discuss the mapping between them. 


2.1 FS Model Definition 


In this section we give a short overview of the FS model [9]. Consider the following 
supersymmetry generator 


L/2-1 


rs ‘ Te 
Ome ley a y~ Ceaonwe ih pI ia) Cok, (5) 
k=1 
where 
k 
Op = Yi Dinj. (6) 
j=l 


This supersymmetry generator is nilpotent 


2 se 
Ors = (Oks) =0. (7) 
The Hamiltonian is built up in the usual way 
Hrs =(Qrs, Oys} 
L-1 


= i i 4 “f 
= ear + Cj_1PjCj+1 + Ici Mjej-1 — ic ;_;Njej—-1) 
j=l 


is 
—2 90 njnjy1 + 2Fi + 2F2 + Abnary, (8) 
j=l 
where 
L/2 L/2 
Fi= Sonja, Fx=) 012), Hbndry = m1 — nv. (9) 


j=l j=l 


This model describes an effective one-dimensional model where the fermions are 
hopping on two chains. Conveniently, these chains are denoted with odd and even 
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Fig. 1 Statistical interaction -i il 1 i 
between the two chains: Lo NOS L \LN 
Filled dots represent 

fermions, empty dots empty UN UT AY 
sites. The hopping amplitudes 

depend on the occupation of YL) LIK YI 7 
the other chain 1 i 1 1 -~i | 


site indices and they are coupled in a zig-zag fashion. The F, and F> operators 
are counting the fermions on the two chains, and Hrs is block diagonal in these 
operators. The interaction between the chains is statistical: the hopping amplitude 
picks up an extra i or —i factor, if the fermion “hops over” another fermion on 
the other chain, see Fig. |. There is a further attractive interaction between the two 
chains. 

The model is defined on an open chain, where the boundary interaction is 
encoded in Hyndry. The model can be shown to be solvable by nested coordinate 
Bethe Ansatz [9]. The spectrum of the model is of the same form as for the free 
model which is defined by 


L-1 


Hyree = 2Fi +2Fr + (¢1 Cyt a Cl 1641) + Hindry. (10) 
j=2 


The eigenenergies of Hrs and H free are 


fith 

E=2 )~ (1+cos(2pa)), (11) 
a=1 
IU 

Pa =™az—>: ma € {1,2,..., L/2}, (12) 


where pq are called momenta, f; and f> are the number of fermions on the 
respective chains, i.e. the eigenvalues of F, and Fp. 

The difference between the free and the interacting model is the degeneracy of 
the energy levels. For the free model the Pauli principle is realized by fermions 
on the same chain not sharing the same momentum. The same momentum can be 
shared by two fermions on the two chains. Hence for an eigenenergy characterized 
by the set {77a}, _ fi af P there are (“4 2) ie ! | possible choices, giving the degeneracy for 
the free model. For the interacting skate instead of thinking in terms of fermions it 
worth to consider exclusons and Cooper pairs. Exclusons are fermionic excitations 
satisfying quantization condition (12) with the further restriction that an excluson 
prohibits any other particle to have the same momentum p,. A pair of fermions 
located on different chains can form a Cooper pair. In this case, two of the momenta 
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(say p; and p2) do not satisfy quantization condition (12) and instead they obey 
cos” P= cos” p2.- (13) 


The net energy contribution of the Cooper pair to (11) is zero. 

The spectrum of the FS model is built up as follows: there are f; and 2 fermions 
on the respective chains. Out of these fermions 2C form Cooper pairs and Nj = 
fi — C, No = fo — C are exclusons on the respective chains. An energy level EF is 


characterized by the quantum numbers {mq ae _ has the degeneracy 
Ni +N2\(L/2-N,-—N 
Fe a haa (14) 
MN C 


The first term counts the possible distributions of the exclusons (with fixed 
{ma}N" a quantum numbers) on the two chains. The second term counts the 
degeneracy of the Cooper pairs. The interpretation of the second piece is that the 
Cooper pairs can be thought of indistinguishable quasiparticles like the exclusons 
and there is one possible Cooper pair for each allowed momentum. Moreover, the 
presence of an excluson with a given momentum prohibits a Cooper pair from 
occupying the corresponding level. This gives the spectrum and the degeneracy of 
the FS model. For further details we suggest the original publication [9]. 


2.2 FGNR Model Definition 


In this section we define the FGNR model [8]. We consider a one-dimensional 
supersymmetric lattice model which is a fermion-hole symmetric extension of 
the M, model of [1]. For this purpose define the operators d 4 and e; by 


al = Pi-1C) Di+1, €j = Nj-1CjNj+1. (15) 
Hence d; creates a fermion at position i provided all three of positions i — 1, i and 


i+ 1 are empty. Similarly, e; annihilates a fermion at position i provided i and its 
neighbouring sites are occupied, i.e. 


di |T1 ++. Tj2 000 t+2. : - TL) = (-1)"-1 |T1 ...% 2010 T42.. . TL) 5 


ej |T1 oes 2 111 Tj42... TL) = (-1)%-1 |T1 oo» G2 101 Tj42--- TL) ; (16) 


while these operators nullify all other states. Here .% is the number operator. It 
counts the number of fermions to the left of site 7. 


May te, Me = M, (17) 
j=1 


where -/f is the total fermion number operator. 
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We now define the nilpotent supersymmetric generators for the FGNR model 


L 
Oranr= >) +e),  Qeovr=0. (18) 


i=1 


The Hamiltonian is defined in the usual way 


Hronr = {Ovgne OFGnr}- (19) 


The Hamiltonian splits up naturally as a sum of three terms. The first term consists 
solely of d-type operators, the second solely of e-type operators and the third 
contains mixed terms. 


Hronr = 4, 4+ Ai + Aisi, (20) 


Hy =)? (jai + did!) + (djdiss + 4,141) 


L L 


tu =D (eel ela) +E (ola tee 
i 


L 


Arr = ba (cfd), + disiei + e}, 1d} + diei+1) ; (21) 


i 
where we use periodic boundary conditions 


cee (22) 
Because the d;’s and e;’s are not simple fermion operators, they do not satisfy the 
canonical anticommutation relations. As a result this bilinear Hamiltonian can not 
be diagonalized by taking linear combinations of d, e, d* and e’. 

The term H; alone is the Hamiltonian of the M; model of [1]. The addition of 
the operator e; introduces an obvious fermion-hole symmetry a <> e; to the model. 
It turns out that this symmetry results in a surprisingly large degeneracy across the 
full spectrum of Hrgnr. 

Note that the Hamiltonians H; and H7; each contain only number operators 
and hopping terms and thus conserve the total number of fermions. The third 
Hamiltonian H7;; breaks this conservation law. For example, the term Ce 
sends the state |... 1000...) to |... 1110...), thus creating two fermions. Hence 
the fermion number is not conserved and therefore is not a good quantum number. 
However, the number of interfaces or domain walls between fermions and holes is 
conserved and we shall therefore describe our states in terms of these. 
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2.2.1 Domain Walls 


We call an interface between a string of 0’s followed by a string of 1’s a 01-domain 
wall and a string of 1’s followed by a string of 0’s, a 10-domain wall. For example, 
assuming periodic boundary conditions the configuration 


’ 


000|11|000|1]0000]111 


contains six domain walls, three of each type and starting with a 01-domain wall. 
Let us consider the effect of various terms appearing in (21). As already discussed 
in an example above, the terms in H7;7 correspond to hopping of domain walls and 
map between the following states 


|...1]000...) + esttt|0x), |...of111...) + —|...000|1...), 
(23) 


where the minus sign in the second case arises because of the fermionic nature of 
the model. Hopping of a domain wall always takes place in steps of two hence the 
parity of the positions of the domain walls is conserved. Aside from their diagonal 
terms, H; and H7; correspond to hopping of single fermions or holes and therefore 
to hopping of pairs of domain walls. They give rise to transitions between the states 


|...0|1]00...) + |.--00|1]0...), |.--1Joli1...)<+—|...11)o]a... 
(24) 


Note that in these processes the total parity of positions of interfaces is again 
conserved, i.e. all processes in Hrgypr conserve the number of domain walls at 
even and odd positions separately. 

Finally, the diagonal term )°; (a; dj +d; di + ele; + eie;) in H; and H7; counts 
the number of 010, 000, 111 and 101 configurations. In other words they count the 
number of pairs of second neighbour sites that are both empty or both occupied 


Yd} d; + did} +e} e; + e:e}) = D> (pi-1 pig + ni—1ni41)- (25) 
i i 


This is equivalent to counting the total number of sites minus twice the number of 
domain walls that do not separate a single fermion or hole, i.e. twice the number of 
well separated domain walls. 

Since the number of odd and even domain walls is conserved the Hilbert space 
naturally breaks into sectors labeled by (m,k), where m is the total number of 
domain walls, and k is the number of odd domain walls. Due to periodic boundary 
conditions m is even. 
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2.2.2 Solution of the FGNR Model by Bethe ansatz 


The FGNR model with periodic boundary conditions is solved by coordinate Bethe 
Ansatz [8]. There are two kinds of conserved particles (even and odd domain walls), 
and hence the model is solved by a nested version of the Bethe Ansatz. A solution 
in the (m,k) sector (with m — k even and k odd domain walls) is characterized 
by the Bethe roots {z1,..., Zm; U1,..., Ux}. In other words, z type Bethe roots are 
associated with both kinds of domain walls, and u type of Bethe roots are associated 
with odd domain walls. The complex numbers {z1,..., Zm3 U1,..., Ux} satisfy the 
following Bethe equations, 


k 2 
i +j74/? I] aaah eee — cS = ste} , 
: qe Ge = Wey 


j=lsm (me2N), (26) 


m _ ae | V2 
i to a =e (27) 
jal uy + (2; — 1/z,) 


where the + is the same for all 7. Solutions corresponding to a nonzero Bethe vector 
are such that 


G#G, Viej €{l,....m), 


uj #uj, Vi, j €{I,...,k}. (28) 


Two solutions {Z1,...,Zm3 U1,-.-, Uk}, {215 +++» Zi Uy, --+, U;} lead to the same 
Bethe vector if there exist two permutations 7 € S, ando e€ Sy and a set of 
Zy numbers {¢;}7" , (i.e. €; = +1) such that z; = €j2n(;) and uj; = Us (1): The 
eigenenergy corresponding to the Bethe roots {z1,..., Zm3 U1,..-, Ux} IS 


m 
A=L+ gj +77 —2), (29) 


i=1 


which in fact depends only on the non-nested z Bethe roots. The Bethe equations 
have free fermionic solutions in the following cases. When k = 0 there are no 
Bethe equations at the nested level and the first set of equations simplifies to the 
free fermionic case. When k = | the u = 0 solutions give the free fermionic part of 
the spectrum. It is worth to note that the spectrum of the FGNR model with periodic 
boundary conditions does have a non free fermionic part. This part will not transfer 
to the open boundary case. 
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2.2.3. Cooper Pairs 


Consider a free fermionic solution 


gare, FS 1.3h, (30) 
where J; is a (half-)integer. This solves the Bethe equations for the k = 0 case, or 
the k = ‘ case with u = 0. This same solution can be used to find a solution in the 
sector with two more odd domain walls (with k = 2 and 3 in the respective cases). 
Consider the k = 2 case. Bethe equations (26) with k = 2 are solved by (30) if the 
nested Bethe roots u, uz satisfy 


uz = —-Uuj (31) 


as in this case the two new terms in the first Bethe equations cancel each other 


Spolgy wa Gea iiey 
uy +(zj —1/zj)* ua + (2; — 1/z;)? 


= @j = V2 uit @j— Wz _, (32) 
uy + (zj —1/zj)?_ wu — (2; — 1/z;)? 
Hence the first Bethe equations (26) with solution (30) serve as a consistency 
condition for vu}, u2 


= rs ge 15 (33) 
= pet (aj — TZ; 


For a free fermionic solution there are always (purely imaginary) uv. = —u, type 
nested roots. As this solution has the same z type roots, a solution with the Cooper 
pair has the same energy A as the original one. We can continue like this introducing 
new Cooper pairs. The creation of Cooper pairs is limited by the number of domain 
walls. For further details we suggest [8]. 


3 Mapping Between the Domain Wall and Particle 
Representations 


The definition of the open boundary version of the FGNR model is straightforward. 
In the open Laren version with a system size L the operators dj, d and el, ej are 
defined fori = 1, — 1. When i = 1 we need to introduce the extra 0- th site 
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and fix its state to be empty. With these definitions the open boundary supercharge 
L-1 
OBC 7 
One =D (di +e) (34) 
i=1 


is well defined and nilpotent. In this section we would like to lighten the notation 
for the supercharges and work with their conjugated counterparts. We also need to 
introduce an intermediate model given in terms of the operators 


8i = Pi+1CiNi-1, fi = Ni4+1Ci Pi-1. 
Fix L to be even. We have the following supercharges 


L-1 


OBC 
o, = Gi) = x (di + ¢}) , (35) 
i=l 
L/2-1 
oy = Ors = cle + se Glas (coseita—17/2 + cage! ; (36) 


i=l 


We also define two additional supercharges oy zy and Gg) z, Which represent hopping 
of domain walls and are required as an intermediate step of the mapping, 


L/2-1 
ol,= >. (+4), (37) 
i=1 
L/2-1 . 
Oe =sit Al + > (e241 oF a) ; (38) 


i=1 


Notice that the first terms in the summations in the charges Q; and ron zy contain 
ng. As mentioned above, we need to fix the 0-th site to be unoccupied. Hence the 
eigenvalue of no is always 0. 

We would like to find the map between OF and O}. Assume that it is given by a 
transformation T, 


Q| =TotTt, T = PIM, (39) 


which itself consists of three terms. The first operator M turns creation and 
annihilation of domain walls into hopping of domain walls. The second operator 
I turns domain walls into particles and the third operator P fixes the phase factors 
such that they match (36). 

Now we turn to the discussion of the transformations M, I" and P. 
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3.1 Transformation M 


The first term M translates to the dynamics of domain walls, i.e. it transforms linear 
combinations of the operators d, d* and e, et into linear combinations of f, f* and 
g, gi (see [8] for more details). More precisely: 


\(L-1)/4] 
M= J] (cat —€4;4:)(4i42 — Ch42)s (40) 
i=0 
and for all even i we have 
M(di+e))M' = fitgei, Mdigite}, Mt = fi, t+ gist. (41) 
In other words M turns Q* into a combination of Q, and o; 


MQ;M'=Q.1+Q),. (42) 


Thus we get an intermediate model. 


3.2. Transformation T 


The next transformation J” turns domain walls into particles. In fact, there is a family 
of such transformations. We select I” to be of the following form 


L-1 
P= TT] (ei + mv(ehy + eis). (43) 


i=1 


This operator satisfies I * = ] and transforms the monomials in c; and c of (42) 
into those of Q* (36). More precisely, conjugation by I” has the following effect 


EOse Ol ry =O); (44) 


where the new supercharge 0; differs from Q* only by phase factors. Let us take 
I and act termwise on Q, + Qb, i.e. on the combination 


fits) + fiat gin. 


for the labels i > O and separately on the first term al + gi. We find 


. e bPAl 
‘af (a en a) ris —cleo Pe ae MOj+L (45) 
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and 


‘ 
re(fitel t+ fiten) ry 


: : 2i-1 : L/2-1 
_ (4 in a nj Ay : in Oe n2j4i 
= (ch yen eM PA — ch ena) eM EH (46) 
Therefore 0; as defined in (44) becomes 
: L/2-1 
A at mae 
Q,=-c;c2@ iat OY 
L/2-1 
: 2i-1 , L/2-1 
+ iv 1) Aj 7 in vy M241 
a » (chi seae Dil i chi coisz)e De jaiel MOH (47) 


i=l 


This agrees with (36) up to the phase factors. 


3.3. Transformation P 


Let p(11, v2,..., vz) be an unknown function of a binary string, v; = 0, 1. Write a 
generic phase factor transformation Pz, 


Pps etd P(min2 oe nL) | (48) 


and the function p introduced above denotes the eigenvalue of p on the state 
|v1,..-, Vz). The commutation relations 


P(N, ...,Ni,.-.,NL)Ci =Cip(n,...,nj —1,...,nL), (49) 
D(nq,...,Nj,- ..,nt)e, = c} p(n, ..., Mi +1,...,n2), (50) 

hold on all states of the Hilbert space. Let us find P such that 
O| =P, Ol Pe. (51) 


Commuting Py, through each monomial of 0; and comparing it with the corre- 


sponding monomial of oy we find L — 1 conditions. Commuting P,, with the first 
monomial in (47) and acting on the state |v), ..., v) leads to the first condition 


L/2-1 
pQiit+1,v2—1,...,vL) — p(1, v2,..-, vn) +2 > v2aj41+2=0. (52) 
j=l 
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Commuting P;, with the two bulk terms in (47) leads to 


p(y,..-, vai — 1, vaiga +1,..., 02) — pi, .--, vz) 


2i-1 L/2-1 214 
es a Mya = Yi (1 yy, (53) 
j=itl j=l 
and 
p(v1,.--, vaiti + 1, vait2 —1,..., 02) — pi, -.-, Vz) 
L/2-1 
+2 >> nya t= De Iivj. (84) 
j=it+l 


The second equation here with i = 0 reproduces (52). In the second equation we 
can replace v2j4) with v2j4+; — 1 and v2;42 with v2;4+2 + 1. As a result it becomes 
of the same form as the first one. Hence (53) and (54) together define L equations 
fork =0,..., 2 —1, where for even k one uses (54) and for odd k one uses (53). 
These equations are valid modulo 4 and can be further simplified 


p(y,-.-, v2 — 1, vag + 1,..-, 02) — pi, .--, vz) 


2i-1 L/2-1 
= as itty, +2 a V2j415 (55) 
j=t+l 
and 
P(1,-.-, vidi — 1, vata +1,..., vn) — pt, ..-, vz) 

L/2-1 
-t 1p; +2 x vajt1+2. (56) 

j=! j=itl 


These two equations can be united into one equation using one index k which can 
be odd or even 


D(V1, «+ +5 Vis Vetls +++ VL) — PO, .-., Ve +1, V4 — 1,..., vz) 
= wx(J,.--, V2N), (57) 
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where the right hand side is given by 


k-1 L=1 
wey... v2) = = (-D) + Y0(- Dj + YS = D)vj- 8) 
j=l juk+2 


Therefore we find a set of recurrence relations for the functions p. Note that for a 
given configuration with the binary string (11, ..., v_) these equations are assumed 
to hold for those values of k for which vz = 0 and 1441 = 1. Hence the number of 
such equations is equal to the number of domain walls of type 01. 


3.4 Particle Position Coordinates 


It is more natural to solve Eq. (57) in a basis where the vectors are labelled using 
particle positions x;,. The Hilbert spaces in both models are given by vectors labelled 
by strings of L numbers t; = 0, | 


L a 
aste=(=[] (4) 1) (59) 


We attached the subscript n in the above notation in order to distinguish it from 
another labelling of the vectors in the same Hilbert space. Let m be the number 
of particles in the system. Let us introduce a basis labelled by the positions of the 
particles and let p denote the mapping between the two labellings 


Pt|V1,..-,VL)_ E> [X1,---,Xm)x- (60) 


The numbers x; are the eigenvalues of the operators x; which coincide with the 
eigenvalues of the operators jn; with j = xx. 
Fix m to be the total number of particles in the system and define two functions 
p and w using the mapping p 
p (Am, ..+5 ML) v1, aia | VL)n) _ Pi, ahs tem) Ix1, es Saint) ae > 


p (tix (n1, veag MZ) WPizeses VE) g) = Wk(X1,-+-5Xm)|X1,---.Xm) x > 


with w, being the diagonal operator with the eigenvalues (58). We can now 
rewrite (57) and (58) in the particle position basis 


P(X1, +++ Xjy-+-,Xm) — P(X1,..., xj —1,...,Xm) = Wi (XI, .--, Xm), (61) 
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with 
k-1 m 
fe (x1, Xm) = +E) — ep + DIa-Cp"). 2) 
i=1 i=k+1 


Once again this equation is considered to hold for j such that x; — x;-1 > 1. The 
generic solution of (61) is 


m Xk 
P(X1,--+5 Xm) = p(,2,...,.m)+ >> we, 2,...,k — 1,i, Xe41,---,Xm), 
k=1 i=k+1 
(63) 
which can be checked by a direct calculation. Here p(1,2,...,m) is the initial 
condition and can be chosen to be 0. Inserting wx we get 
m Xk k m 
Ba,....%m =>) DS) [14+ (-D'- epi + DO a-CD*) 
k=1 i=k+1 j=l j=k+1 
(64) 
The required phase transformation takes the form 
Pr = eid (PRL km)op) | (65) 


3.5 Examples 


To illustrate the mapping, we show some examples of corresponding states between 
the FS model (zig-zag ladder) and FGNR model states (up to phase factors +1, -£), 


empty ladder : 0000 0000 0000) rs <> 0|110011001100)rGnr 
single FS semion : 0000 1000 0000)Frs <<  9{110000110011)FrEnrR 
single FS pair : 0000 11000000)Fs << ~ 9|110001001100)rEenrR 
lower leg filled : 1010 10101010)rs5  <+ {00000000 0000) rFEnr 
single FGNR particle : 101001101010)r5 <> 90000 1000 0000) rEenr 
upper leg filled : 010101010101)r5 << 0/101010101010)rGnr 
upper leg plus semion: |110101010101)r5 < 0/010101010101)rGnrR 
filled ladder : L111 1111 1111)r5 <— 9011001100110) rGnr 
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For L = 4 the phase factors take the explicit values 


Pp) =0, p2)=2, pGB)=9, p4)=0 

pd,2)=90, pd,3)=1, pd.4=9, p2,3)=1, pZ2,4)=2, pG,4)=2 
p(1,2,3)=0, p(l,2,4)=2, p(1,3,4)=3, p2,3,4)=3 

BCI, 2, 3,4) =0 (66) 


Finally we provide an explicit example of all the steps in the mapping. Let us act 
with both sides of (39) on the state |010101) 


O; 010101) = PrMQ{M'I* P~'\O10101), 
The action of Oo; results in 


0; |010101) = |001101) +1|010011) — |010110) +1/011001) + |100101), 
and the right hand side is computed as follows 


PrMOQ{M'r' P~'\010101) = —PrMQ{M'r* |010101) 

= PIMQ)M' |011001) = PrM@Q;} (101010) 

= PI’ M( (001010) — |100010) + |101000) + |101110) — |111010) ) 
= PI”(|001001) — |010001) + |011011) + |011101) + |111001) ) 
= P(|001001) — |010001) + 011011) + |011101) + |111001) ) 

= |001101) + i|010011) — |010110) + i 011001) + |100101) . 


4 Conclusion 


We have established a unitary transformation between the FS and FGNR models 
with open boundary conditions. We are confident that this map will be helpful for 
unraveling the properties of these highly intriguing models. For example, in the FS 
formulation it was not clear how to impose periodic boundary conditions without 
losing the supersymmetry—this issue is now resolved. 

It is a pleasure to dedicate this work to Bernard Nienhuis on the occasion of his 
65th birthday. Bernard has always had a special eye for ingenious maps relating 
apparently different models of statistical physics to one another. We can only hope 
that dedicating this work to him finds some justification in our following a similar 
strategy for one of the many integrable models that he has pioneered. 
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Remarks on AY Face Weights ®) 


Check for 
updates 


Atsuo Kuniba 


Abstract Elementary proofs are presented for the factorization of the elliptic 
Boltzmann weights of the AM face model, and for the sum-to-1 property in the 
trigonometric limit, at a special point of the spectral parameter. They generalize 
recent results obtained in the context of the corresponding trigonometric vertex 
model. 


1 Introduction 


In the recent work [8], the quantum R matrix for the symmetric tensor representation 
of the Drinfeld-Jimbo quantum affine algebra U, (AM) was revisited. A new 
factorized formula at a special value of the spectral parameter and a certain 
sum rule called sum-to-1 were established. These properties have led to vertex 
models that can be interpreted as integrable Markov processes on one-dimensional 
lattice including several examples studied earlier [7, Fig. 1,2]. In this note we 
report analogous properties of the Boltzmann weights for yet another class of 
solvable lattice models known as IRF (interaction round face) models [2] or face 
models for short. More specifically, we consider the elliptic fusion AY face model 
corresponding to the symmetric tensor representation [5, 6]. For n = 1, it reduces 
to [1] and [4] when the fusion degree is 1 and general, respectively. There are 
restricted and unrestricted versions of the model. The trigonometric case of the latter 
reduces to the U, (AM) vertex model when the site variables tend to infinity. See 
Proposition |. In this sense Theorems | and 2 given below, which are concerned 
with the unrestricted version, provide generalizations of [8, Th. 2] and [8, eq. (30)] 
so as to include finite site variables (and also to the elliptic case in the former). In 
Sect. 3 we will also comment on the restricted version and difficulties to associate 
integrable stochastic models. 
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2 Results 


Let 6\(u) = 61 (u, p) = 2p3 sinzu[ [2,1 — 2p** cos 2xu + p*)(1 — p**) be 
one of the Jacobi theta function (|p| < 1) enjoying the quasi-periodicity 


O(u +1; ra = —6)(u; ey, O(u+T; rn) = ag FP (iy gry, 


(1) 


where Imt > 0. We set 


u uy tule 
[uw] =O.) We = (wll = 1 wk + 1, Lil=ia: (k € Zs0), 
(2) 


with a nonzero parameter L. These are elliptic analogue of the g-factorial and the 
q-binomial: 


m-—1 ; m (q)m 
m= %3Q)n = 1—zq' : ~ @i@Qm—1 
(Z)m = (23g) I a") @) (Q)iQ)m—1 


For a = (@,...,@x) with any k we write |a| = a; + --- + ax. The relation 
B => y or equivalently y < 6 means Bi > y; for alli. 


We take the set of local states as Y = + Z*! with a generic n € C”*!, Given 
positive integers / and m, let a, b, c,d € FY be the elements such that 


a=d-—-aceB, B=c-déeBn, y=c-beEeB, b=b—-ae Bn, 
(3) 
where B,, is defined by 
+1 


Bm = {@ = (@1,---,Q@n+1) € ha | |a| = m}. (4) 


The relations (3) imply aw + 8 = y + 6. The situation is summarized as 


é 
a 4 b 
a > VY 
d Cc 
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To the above configuration round a face we assign a function of the spectral 
parameter u called Boltzmann weight. Its unnormalized version, denoted by 


Wim ‘ : lw), is constructed from the / = 1 case as follows: 


— sab = ap 
Wim(, i u) = LT Mim (Gory pith u-i), (5) 
rc — 
+1 
— sab [u + by — ap] TT 2; (54, )lov — aj +1 
Wim( |) = ————_ (d=atey, c=b+ey), 
oe Tj zilev — bj] 


ith = 
where e; = (0,...,0, 1,0,...,0).In(5),a,...,a” € Pisapath forma = 
a toa” = d such that a“t!) — a € B,( <i < 1). The sum is taken over 
bY, ...,b¢-) € FY satisfying the conditions b“+!) — b® € By (0 <i <1) with 
b© = band b© = c. It is independent of the choice of a) @@-D (cf. [4, 
Fig. 2.4]). We understand that Wim(4 2 |e) = 0 unless (3) is satisfied for some 


a, B,y, 6. 
The normalized weight is defined by 


wim 2h) = Fim Cel | © 


It satisfies [6] the (unrestricted) star-triangle relation (or dynamical Yang-Baxter 
equation) [2]: 


(7) 


where the sum extends over g € Pp giving nonzero weights. Under the same 
setting (3) as in (6), we introduce the product 


sie et] TL amen @) 


ere Lei — bj leo; 


Note that S71 G >) = Ounlessd < b because of the factor ie [ci —dj]c;—-p,. The 


following result giving an explicit factorized formula of the weight W/, at special 
value of the spectral parameter is the elliptic face model analogue of [8, Th. 2]. 
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Theorem 1 /f/ < m, the following equality is valid: 


ab ab 
Win(% fe =0) = Sin(*) ) 
Proof We are to show 
— sab [2] [ci — djle;—v; 
W 0 — te pad ane ho 10 
nm( 7 Cc ) [1]! I] [ci — bj\c;-0; id 


Here and in what follows unless otherwise stated, the sums and products are taken 
always over 1,...,-+ 1 under the condition (if any) written explicitly. We invoke 
the induction on /. It is straightforward to check (10) for / = 1. By the definition (5) 
the / + 1 case is expressed as 


Wistm(G ° 0) = EW in( yo 


,|0)W, m ie 


-1) (d’ =d-ey,c =c-e) 


for some fixed yz € [1, n +1]. Due to the induction hypothesis on W),m, the equality 
to be shown becomes 


y- (1) (T] [c; _ die, a= i+ Cc, _ 1 Whetuley —d.t+ 1] 
TEM, + 1 = bla Halen = ei] 
(11) 
— t+. Tl le; — dj\c;—0; 
TE fer — bla 
After removing common factors using é = cj — diy, d; = d; — 5;,, one finds 


that (11) is equivalent to 


Sle = dy = OT SE te = b= + Ys = dy + 0 


Cy — Ci 
v iv v— ci j i 


with / determined by / + 1 = )° ; (c; — b;). One can eliminate d,, and rescale the 
variables by (bj,cj) > (Lb; + d,, Lcj + d,) for all j. The resulting equality 
follows from Lemma 1. 


Lemma 1 Let bj, ...,bn,C1,---,¢n € C be generic and set s = )~"_,(c; — bj). 
Then for any n € Zs, the following identity holds: 


Yacta-s) T] eee Tet — bj 5) = 6108) | Joule + bp. 


i=l j=l G#i) A(ci — ao i=l 
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Proof Denote the LHS—RHS by f(z). From (1) we see that f(z) satisfies (12) with 
B=5,A1= atte) + Li b; and Az = n. Moreover it is easily checked that 
f(Z) possesses zeros at z = —cj,...,—Cy,. Therefore Lemma 2 claims —(c, + 
ss) +Cy) — (Bt t+ 5 Ap — Aj) =0mod Z+ Zr. But this gives s = 0 which is a 
contradiction since b;, c; can be arbitrary. Therefore f(z) must vanish identically. 


Lemma 2 Let Imt > 0. Suppose an entire function f(z) ¥ 0 satisfies the quasi- 
periodicity 


FEtY=ePMFF®, f+ 1) =e PATA FQ), (12) 


Then Az € Zso holds and f(z) has exactly Az zeros z1,...,ZA, mod Z + Zt. 
Moreover z1 +---+Z4, = Btt+ 5A2 — A, mod Z+ Zt holds. 


Proof Let C be a period rectangle (§, + 1, + 1+ 1,&-+ 17) on which there is no 
zero of f(z). From the Cauchy theorem the number of zeros of f(z) in C is equal to 


I, Cc L@ dz Calculating the integral by using (12) one gets Az. The latter assertion 


Tt) 2ni 
can be shown similarly by considering the integral /¢. a om fe 


From Theorem | and (7) it follows that S) j, i p) also satisfies the (unrestricted) 


star-triangle relation (7) without spectral parameter. The discrepancy of the factoriz- 
ing points wu = 0 in (9) and “u = 1—m” in [8, Th. 2] is merely due to a conventional 
difference in defining the face and the vertex weights. 

Since (6) and (8) are homogeneous of degree 0 in the symbol [- - - ], the trigono- 
metric limit p —> 0 may be understood as replacing (2) by [u] = q“/? — q~“/? with 
generic q = exp amt Under this prescription the elliptic binomial (7 | from (2) is 


replaced by g!¢—™)/2 (ae therefore the trigonometric limit of (8) becomes 


a b m\~! ( mie 
c/ trig @ 1<ijent1 (qr I ejb; 


The following result is a trigonometric face model analogue of [8, Th. 6]. 


Theorem 2 Suppose 1 < m. Then the sum-to-1 holds in the trigonometric case: 


ab 
Vial) = (14) 
b c/ trig 


where the sum runs over those b satisfying c— d € By andd —a € Bj. 


Proof The relation (14) is equivalent to 


m (q%ii—Vi Bj oe 
(7), Xl Gamo, “seg o> 


yeB,,y<B 1<i,j<n+1 
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for any fixed 6 = (61,..., Bn+1) € Bm, / < m and the parameters c),..., Cn41, 
where the sum is taken over y € B; (4) under the constraint y < 6. In fact we are 
going to show 


a —Vitls. I (> sy: 
(w, Pieh wig), 7 (q vit zi/(zjwj))y, a 
gy a A GrMg fz) «(1 © 20 
lial i<ijcn “4 i/Zj)y, 
(16) 
where the sum is over y € Z2) such that |y| = J, and wi,...,Wn,Z1,+--52n 


are arbitrary parameters. The relation (15) is deduced from (16)|n+n+1 by setting 
zi = qi, w; = qi and specializing B;’s to nonnegative integers. In particular, the 
constraint y < B automatically arises from the i = j factor [T/_,(q7%t!**),, in 
the numerator. To show (16) we rewrite it slightly as 


q? (w1 on ~ Tot we widy I] Nea (17) 


ly|=li=1 Dy l<i¢jen (q7Vizj;/zi)y; 


Denote the RHS by F,(w1,..., Wn|Z1,.--, Zn). We will suppress a part of the 
arguments when they are kept unchanged in the formulas. It is easy to see 


a Z1UW1 


Fy (wy, w2|Z1, 22) = Fr(w2, wi|z2, 21) = 


Thus the coefficients in the expansion F;,(w1, w2|Z1, Z2) = Vo<i,j<i Ci, j (Z1, 22) 
wi ws are rational functions in z1,...,Z, obeying Cj, ;(Z1, 22) = Cj,i(Z2, 21) = 
(2) ae j,i (21, 22). On the other hand from the Se formula a one also finds 


that any Cj, ;(z1, Z2) remains finite in the either limit 2 aL as > oo or = ae ee — 0 for 
i => 3. It follows that C;, ;(z1, z2) = 0 unless i = j, henge 


Fy(w1, W2,..., WalZ1,---,2Zn) = Pn, wiwe, w3,..., WalZ1,--+5 Zn). 
Moreover it is easily seen 
Fr(, wiw2, w3,..., WalZ1, 22,+-+5 Zn) = Pn—-1(w1W2, W3,..., WnlZ2,---5 Zn). 


Repeating this we reach F}(w1 --- Wn|Zn) giving the LHS of (17). 


We note that the sum-to-1 (14) does not hold in the elliptic case. Remember that 
our local states are taken from Y = n + Zt! witha generic 7 € C”t!. So we set 
a=nt+awithae Z"*! etc. in (4), and assume that it is valid also for @, b, C, d.It 
is easy to check 
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Proposition 1 Assume 1 < m and |q| < 1. Then the following equality holds: 


2 é oes: -1ntl 79 
lim S; ae ee *) = ghi<i(Bi-W7; (") I ("") (18) 
00 n+d n+c/trig I m4 Wid g 


q 


where the limit means nj — ni4+1 — © forall 1 <i <n, and the RHS is zero unless 
0<V < Bi, Vi. 


The limit reduces the unrestricted trigonometric A ® face model to the vertex model 
at a special value of the spectral parameter in the sense that the RHS of (18)|,_, 2 
reproduces [8, eq. (23)] that was obtained as the special value of the quantum R 
matrix associated with the symmetric tensor representation of Ug (AM), 


3 Discussion 


Since the weights Wim( 4 B 


u) remain unchanged by shifting a,b,c,d € Pp by 
const - (1,..., 1), we regard them as elements from Y := FYICA, ...,1) in 
the sequel. Given /,m1,...,my € Zs, and u,wj,...,wm € C, the transfer 
matrix 7)(u) = T; (u 
boundary condition is a linear map on the space of independent row configurations 
on length M row @ Cla™,...a) where the sum is taken over a“), ...a™ € 
P such that a’t) — a) € By, (a“@t) = a), Its action is specified as 
a)... gM : 
Tu) |b, ...b) = Le. a) T(uyiay’ “an 1a, ...a™)) in terms of the 
matrix elements 


M1,...,M 
W1,..,WM 


) of the unrestricted ASD face model with periodic 


a® git) 


a, a 
T) (u) dD, bOD Tl WwW, mi 


= wi) (aMt) = g™), pM@td — py, 


b@ pit 
(19) 
Theorem | tells that S$; := Ti Wu iS -=wy has a simple factorized matrix 
(M) 
elements. We write its elements as iy at io oun: The star-triangle relation (7) implies 


the commutativity [JT] (u), Ty (u’)] = 1S Sy] = 0. 
Let us consider whether X = 7)(u) or S; admits an interpretation as a Markov 
matrix of a discrete time stochastic process. The related issue was treated in [3] 


for n = | and mainly when min(/,m,,...,my) = 1. One nes (i) sum-to-1 
a) gM gq) 

property 7,0). gm Xp aM) = = | and (ii) nonnegativity vo “yu = 0. We 

concentrate on the trigonometric case in what follows. From Theorem | and the fact 


that S$) m (: a) : in (13) is independent of a, (i) indeed holds for S;. On the other 
Tl 


hand (13) also indicates that (ii) is not valid in general without confining the site 
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variables in a certain range. A typical such prescription is restriction [4-6], where 
one takes L = +n + 1 in (2) with some £ € Zs and lets the site variables range 
over the finite set of level € dominant integral weights {(Z + an41 — a1 — 1)Ao + 
ie (Gi — aig. — WA; | L + @ny1 > a1 > +++ > Gn41, a; — aj € Z}. They are 
to obey a stronger adjacency condition [6, p. 546, (c-2)] than (3) which is actually 
the fusion rule of the WZW conformal field theory. (The formal limit £ — oo 
still works to restrict the site variables to the positive Weyl chamber and is called 
“classically restricted”.) Then the star-triangle relation remains valid by virtue of 
nontrivial cancellation of unwanted terms. However, discarding the contribution to 
the sum (14) from those b not satisfying the adjacency condition spoils the sum-to- | 
property. For example when (n,/,m) = (2,1,2),a = (2,1,0),c = (4,2,0),d = 
(3, 1,0) and @ is sufficiently large, the unrestricted sum (14) consists of two terms 
=A f=. v; =1 7,3: 
Sim( 4 Oe i (), - a forbs Ue Oyand Sim (4 ae = ()q é an 
for b’ = (3, 2,0) summing up to 1, but b’ must be discarded in the restricted case 


since a = b’ [6, (c-2)] does not hold. Thus we see that in order to satisfy (i) and (ii) 
simultaneously one needs to resort to a construction different from the restriction. 
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Boundary Regularity ® 
of Mass-Minimizing Integral Currents pes 
and a Question of Almgren 


Camillo De Lellis, Guido De Philippis, Jonas Hirsch, and Annalisa Massaccesi 


Abstract This short note is the announcement of a forthcoming work in which 
we prove a first general boundary regularity result for area-minimizing currents in 
higher codimension, without any geometric assumption on the boundary, except 
that it is an embedded submanifold of a Riemannian manifold, with a mild amount 
of smoothness (C? for a positive ag suffices). Our theorem allows to answer a 
question posed by Almgren at the end of his Big Regularity Paper. In this note we 
discuss the ideas of the proof and we also announce a theorem which shows that the 
boundary regularity is in general weaker that the interior regularity. Moreover we 
remark an interesting elementary byproduct on boundary monotonicity formulae. 


1 Introduction 


Consider a smooth complete Riemannian manifold &' of dimension m + n and 
a smooth closed oriented submanifold 7 C 2 of dimension m — 1 which is a 
boundary in integral homology. Since the pioneering work of Federer and Fleming 
(cf. [20]) we know that I” bounds an integer rectifiable current T in & which 
minimizes the mass among all integer rectifiable currents bounded by I. 
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In general, consider an open U C » anda submanifold I~ C » which has no 
boundary in U. If T is an integral current in U with AT LU = [I"]LU we say that 
T is mass-minimizing if 


M(T + 0S) = M(T) 


for every integral current S in U. 

Starting with the pioneering work of De Giorgi (see [7]) and thanks to the efforts 
of several mathematicians in the sixties and the seventies (see [3, 8, 21, 29]), it is 
known that, if X’ is of class C?- for some ag > O, in codimension 1 (i.e., whenn = 
1) and away from the boundary I”, T is a smooth submanifold except for a relatively 
closed set of Hausdorff dimension at most m — 7. Such set, which from now on we 
will call interior singular set, is indeed (m — 7)-rectifiable (cf. [28]) and it has 
been recently proved that it must have locally finite Hausdorff (7m — 7)-dimensional 
measure (see [27]). In higher codimension, namely when n = 2, Almgren proved 
in a monumental work (known as Almgren’s Big Regularity Paper [4]) that, if 2’ 
is of class C>, then the interior singular set of T has Hausdorff dimension at most 
m — 2. Ina series of papers (cf. [9—13]) the first author and Emanuele Spadaro have 
revisited Almgren’s theory introducing several new ideas which simplify his proof 
considerably. Furthermore, the first author together with Spadaro and Spolaor, in 
[14-17] applied these sets of ideas to establish a complete proof of Chang’s interior 
regularity results for 2 dimensional mass-minimizing currents [6], showing that in 
this case interior singular points are isolated. 

Both in codimension one and in higher codimension the interior regularity theory 
described above is, in terms of dimensional bounds for the singular set, optimal 
(cf. [5] and [19]). In the case of boundary points the situation is instead much 
less satisfactory. The first boundary regularity result is due to Allard who, in his 
Ph.D. thesis (cf. [1]), proved that, if 2 = R”*” and I is lying on the boundary 
of a uniformly convex set, then for every point p € I” there is a neighborhood W 
such that T LW is a classical oriented submanifold (counted with multiplicity 1) 
whose boundary (in the usual sense of differential topology) is 7M W. In his later 
paper [2] Allard developed a more general boundary regularity theory from which 
he concluded the above result as a simpler corollary. 

When we drop the “convexity assumption” described above, the same conclusion 
cannot be reached. Let for instance I” be the union of two concentric circles y; and 
2 which are contained in a given 2-dimensional plane 79 C R**” and have the 
same orientation. Then the area-minimizing current T in R*+” which bounds I’ is 
unique and it is the sum of the two disks bounded by y; and y2 in zo, respectively. 
At every point p which belongs to the inner circle the current T is “passing” through 
the circle while the multiplicity jumps from 2 to 1. However it is natural to consider 
such points as “regular”, motivating therefore the following definition. 


Definition 1 A point x € I is a regular point for T if there exist a neighborhood 
W > x anda regular m-dimensional connected submanifold YX) C WM X (without 
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boundary in W) such that spt(T) 1 W C Xp. The set of such points will be denoted 
by Regp(T) and its complement in I” will be denoted by Sing,(7). 


By the Constancy Lemma, if x € I” is a regular point, if Xo is as in Definition 1 
and if the neighborhood W is sufficiently small, then the following holds: 


1. I’ 1 W is necessarily contained in Yo and divides it in two disjoint regular 
submanifolds a and Xo of W with boundaries +I; 
2. there is a positive Q € N such that TLW = OQ [20] +(Q-1) [Xo ]- 


We define the density of such points p as Q — 5 and we denote it by O(T, p) = 
0-3. 

If the density is 5 then the point fulfills the conclusions of Allard’s boundary 
regularity theorem and 2 is not uniquely determined: the interesting geometrical 
object is Zt and any smooth “extension” of it across I” can be taken as X9. On the 
other hand for Q > 2 the local behavior of the current is similar to the example of 
the two circles above: it is easy to see that Xo is uniquely determined and that it has 
mean curvature zero. 

When the codimension of the area-minimizing current is 1, Hardt and Simon 
proved in [23] that the set of boundary singular points is empty, hence solving 
completely the boundary regularity problem when 7 = | (although the paper [23] 
deals only with the case © = Rt", its extension to a general Riemannian ambient 
manifold should not cause real issues). In the case of general codimension and 
general I”, Allard’s theory implies the existence of (relatively few) boundary regular 
points only in special ambient manifolds Y: for instance when © = R”*” we can 
recover the regularity of the “outermost” boundary points g € I" (1.e., those points 
q where I” touches the smallest closed ball which contains it, cf. [24]). According 
to the existing literature, however, we cannot even exclude that the set of regular 
points is empty when » is a closed Riemannian manifold. In the last remark of the 
last section of his Big Regularity Paper, cf. [4, Section 5.23, p. 835], Almgren states 
the following open problem, which is closely related to the discussion carried above. 


Question 1 (Almgren) “Ido not know if it is possible that the set of density 5 points 
is empty when U = »' and I is connected.” 


The interest of Almgren in Question | is motivated by an important geometric 
conclusion: in [4, Section 5.23] he shows that, if there is at least one density 5 
point and I” is connected, then spt(T) is as well connected and the current T has 
(therefore) multiplicity 1 almost everywhere. In other words the mass of T coincides 
with the Hausdorff m-dimensional measure of its interior regular set. 

In the forthcoming paper [18] we show the first general boundary regularity 
result in any codimension, which guarantees the density of boundary regular points 
without any restriction (except for a mild regularity assumption on I” and 2: both 
are assumed to be of class C*-“° for some positive ag; note that such regularity 
assumption for the ambient manifold coincides with the one of the interior regularity 
theory as developed in the papers [9-13], whereas Almgren’s Big Regularity Paper 
[4] assumes C>). As a corollary we answer Almgren’s question in full generality 
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showing: when U = » and I is connected, then there is always at least one point 
of density 5 and the support of any minimizer is connected. In the next section 
we will state the main results of [18], whereas in Sect.3 we will give an account 
of their (quite long) proofs. Finally, in Sect. 4 we outline an interesting side remark 
sparked by one of the key computations in [18]. The latter yields an alternative proof 
of Allard’s boundary monotonicity formula under slightly different assumptions: in 
particular it covers, at the same time, the Griiter-Jost monotonicity formula for free 
boundary stationary varifolds. 


2 Main Theorems 


Our main result in [18] is the following 


Theorem 1 Consider a C? complete Riemannian submanifold 5 Cc Rt" of 
dimension m +71 and an open set W C R™*". Let 7 C SOW be aC? oriented 
submanifold without boundary in W 1 & and let T be an integral m-dimensional 
mass-minimizing current in W 1 X with boundary dT LW = [I]. Then Regp(T) 
is dense in I. 


As a simple corollary of the theorem above, we conclude that Almgren’s 
Question | has a positive answer. 


Corollary 1 Let W = R™*" and assume ©, I and T are as in Theorem 1. If Vis 
connected, then 


1. Every point in Regp(T) has density s 

2. The support spt(T) of the current T is connected; 

3. The multiplicity of the current is | at #2” -a.e. interior point, and so the mass of 
the current coincides with F¢™ (spt(T)). 


In fact the above corollary is just a case of a more general “structural” result, 
which is also a consequence of Theorem 1. 


Theorem 2 Let W = Rt” and assume ©, I and T are as in Theorem 1 and that 


I’ is in addition compact. Denote by I\,..., 'y the connected components of I’. 
Then 
N 
T =) 9jT;, (1) 
j=l 
where: 


(a) For every j = 1,...,N, Tj is an integral current with 0T; = Da oij LO] 
and oj; € {—1,0, 1}. 

(b) Forevery j =1,..., N, T; is an area-minimizing current and Tj = FE" LAj, 
where A,,..., Ay are the connected components of of Reg;(T), the interior 
regular set. 
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(c) Each Tj; is 


a. either one-sided, which means that all coefficients oj; = 0 except for one 
j = o(i) for which ojo) = 1; 
b. or two-sided, which means that: 


i. there is one j = p(i) such that ojpi) = 1, 
ii. there is one j = n(i) such that ojnii) = —1, 
iii. all other oj; = 0. 


(d) If I; is one-sided, then Qo) = 1 and all points in I; N RegyT have 
multiplicity - 

(e) If T; is two-sided, then Qnii) = Qpci) — 1, all points in I; MN RegpT have 
multiplicity Q pi) — 5 and Ty) + Tyq) is area minimizing. 


Note that, as a simple consequence of Theorem 2 and the interior regularity 
theory, we conclude that in every two-sided component I; of the boundary I" the 
boundary singular points have dimension at most m — 2. 

In view of the interior regularity results, one might be tempted to conjecture that 
Theorem | is very suboptimal and that the Hausdorff dimension of Singp(7) is at 
most m—2. Though currently we do not have an answer to this question, let us stress 
that at the boundary some new phenomena arise. Indeed, in [18], we can prove the 
following: 


Theorem 3 There are a smooth closed simple curve [TC R* and a mass 
minimizing current T in R4 such that 8T = [I] and Singy(T) has an accumulation 
point. 


In particular Chang’s result, namely the discreteness of interior singular points 
for two dimensional mass minimizing currents, does not hold at the boundary. 
Actually the example can be modified in order to obtain also a sequence of interior 
singular points accumulating towards the boundary, see [18]. 


3 The Main Steps to Theorem 1 


In this section we outline the long road which is taken in [18] to prove Theorem 1. 
We fix therefore »’, [ and T as in Theorem 1. 


3.1 Reduction to Collapsed Points 


Recalling Allard’s monotonicity formula, we introduce at each boundary point 
p € TI the density O(T, p), namely the limit, as r | 0, of the normalized mass 
ratio in the ball B,(p) C R”*” (in particular the normalization is chosen so that at 
regular boundary points the density coincides with the one defined in the previous 
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section). Using a suitable variant of Almgren’s stratification theorem, we conclude 
first that, except for a set of Hausdorff dimension at most m — 2, at any boundary 
point p there is a tangent cone which is flat, namely which is contained in an m- 
dimensional plane 2 > Tol". Secondly, using a classical Baire category argument, 
we show that, for a dense subset of boundary points p, additionally to the existence 
of a flat tangent cone, there is a sufficiently small neighborhood U where the density 
©(T, q) is bounded below, at any g € I'M U, by O(T, p). In particular the proof 
of Theorem | is reduced to the claim that any such point, which we call collapsed, 
is in fact regular. 


3.2. The “Linear” Theory 


Assume next that 0 € I is a collapsed point and let Q — 5 be its density. By 
Allard’s boundary regularity theory for stationary varifolds, we know a priori that 0 
is a regular point if Q = | and thus we can assume, without loss of generality, that 
Q > 2. Fix a flat tangent cone to 0 and assume, up to rotations, that it is the plane 
mo = R” x {0} and that To = {xm = 0} N mp. Denote by zo the two half-planes 
To = {+x > 0}M 20. Assume for the moment that, at suitably chosen small 
scales, the current T is formed by Q sheets over ne and Q — | sheets over 7 . 
By a simple linearization argument such sheets must then be almost harmonic (in a 
suitable sense). 


Having this picture in mind, it is natural to develop a theory of (o - 3) -valued 
functions minimizing the Dirichlet energy. In order to explain the latter object 
consider the projection y of I” onto zo. On a sufficiently small disk B,(0) N xo, 


y divides zo into two regions. A Lipschitz (o - +)-valued map consists of: 


1. a Lipschitz Q-valued map (in the sense of Almgren, cf. [9]) u*+ on one side of y 
2. and a Lipschitz (Q — 1)-valued map u~ on the other side, 


satisfying the compatibility condition that the union of their graphs forms a current 
whose boundary is the submanifold I” itself. A (2 a +)-map will then be called 
Dir-minimizing if it minimizes the sum of the Dirichlet energies of the two 
“portions” ut and u~ under the constraint that I and the boundary values on 
0(B;(0) M 0) are both fixed. 

The right counterpart of the “collapsed point situation” described above is the 
assumption that all the 2Q — 1 sheets meet at their common boundary J”; under 
such assumption we say that the (2 = 5) Dir-minimizer has collapsed interface. 
We then develop a suitable regularity theory for minimizers with collapsed interface. 
First of all their Hélder continuity follows directly from the Ph.D. thesis of the third 
author, cf. [25]. Secondly, the most important conclusion of our analysis is that a 
minimizer can have collapsed interface only if it consists of a single harmonic sheet 
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“passing through” the boundary data, counted therefore with multiplicity Q on one 
side and with multiplicity Q — 1 on the other side. 

The latter theorem is ultimately the deus ex machina of the entire argument 
leading to Theorem |. The underlying reason for its validity is that a monotonicity 
formula for a suitable variant of Almgren’s frequency function holds. Given the 
discussion of [26], such monotonicity can only be hoped in the collapsed situation 
and, remarkably, this suffices to carry on our program. 

The validity of the monotonicity formula is clear when the collapsed interface 
is flat. However, when we have a curved boundary, a subtle yet important point 
becomes crucial: we cannot hope in general for the exact first variation identities 
which led Almgren to his monotonicity formula, but we must replace them with 
suitable inequalities. Moreover the latter can be achieved only if we adapt the 
frequency function by integrating a suitable weight. We illustrate this idea in a 
simpler setting in the next section. 


3.3 First Lipschitz Approximation 


A first use of the linear theory is approximating the current with the graph of 
a Lipschitz (o a $)-valued map around collapsed points. The approximation is 


then shown to be almost Dir-minimizing. Our approximation algorithm is a suitable 
adaptation of the one developed in [10] for interior points. In particular, after adding 
an “artificial sheet”, we can directly use the Jerrard-Soner modified BV estimates of 
[10] to give a rather accurate Lipschitz approximation: the subtle point is to engineer 
the approximation so that it has collapsed interface. 


3.4 Height Bound and Excess Decay 


The previous Lipschitz approximation, together with the linear regularity theory, is 
used to establish a power-law decay of the excess d la De Giorgi in a neighborhood 
of a collapsed point. The effect of such theorem is that the tangent cone is flat and 
unique at every point p € I" ina sufficiently small neighborhood of the collapsed 
point 0 € I’. Correspondingly, the plane z(p) which contains such tangent cone is 
Holder continuous in the variable p € I” and the current is contained in a suitable 
horned neighborhood of the union of such z(p). 

An essential ingredient of our argument is an accurate height bound in a 
neighborhood of any collapsed point in terms of the spherical excess. The argument 
follows an important idea of Hardt and Simon in [23] and takes advantage of an 
appropriate variant of Moser’s iteration on varifolds, due to Allard, combined with 
a crucial use of the remainder in the monotonicity formula. The same argument has 
been also used by Spolaor in a similar context in [30], where he combines it with 
the decay of the energy for Dir-minimizers, cf. [30, Proposition 5.1 & Lemma 5.2]. 
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3.5 Second Lipschitz Approximation 


The decay of the excess proved in the previous step is used then to improve the 
accuracy of the Lipschitz approximation. In particular, by suitably decomposing the 
domain of the approximating map in a Whitney-type cubical decomposition which 
refines towards the boundary, we can take advantage of the interior approximation 
theorem of [10] on each cube and then patch the corresponding graphs together. 


3.6 Left and Right Center Manifolds 


The previous approximation result is combined with a careful smoothing and 
patching argument to construct a “left” and a “right” center manifold. @* and. @~. 
The .@* are C? submanifolds of © with boundary I and they provide a good 
approximation of the “average of the sheets” on both sides of I” in a neighborhood 
of the collapsed point 0 € I”. They can be glued together to forma C!:! submanifold 
MM which “passes through I”. Each portion has C>*" estimates up to the boundary, 
but we only know that the tangent spaces at the boundary coincide: we have a priori 
no information on the higher derivatives. The construction algorithm follows closely 
that of [12] for the interior, but some estimates must be carefully adapted in order to 
ensure the needed boundary regularity. 

The center manifolds are coupled with two suitable approximating maps N~. 
The latter take values on the normal bundles of .@* and provide an accurate 
approximation of the current 7. Their construction is a minor variant of the one 
in [12]. 


3.7 Monotonicity of the Frequency Function and Final 
Blow-Up Argument 


After constructing the center manifolds and the corresponding approximations we 
use a suitable Taylor expansion of the area functional to show that the monotonicity 
of the frequency function holds for the approximating maps N~ as well. 

We then complete the proof of Theorem 1: in particular we show that, if 0 
were a singular collapsed point, suitable rescalings of the approximating maps N~ 


would produce, in the limit, a (0 - 5) Dir-minimizer violating the linear regularity 
theory. On the one hand the estimate on the frequency function plays a primary role 
in showing that the limiting map is nontrivial. On the other hand the properties of 


the center manifolds .@~ enter in a fundamental way in showing that the average 


of the sheets of the limiting (0 - 3) map is zero on both sides. 
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4 Weighted Monotonicity Formulae 


In this section we want to illustrate in a simple situation an idea which, in 
spite of being elementary, plays a fundamental role in our proof of Theorem |: 
boundary monotonicity formulae can be derived from the arguments of their interior 
counterparts provided we introduce a suitable weight. 

Let I be an (m — 1)-dimensional submanifold of R”*”. We consider an m- 
dimensional varifold V in R’*” \ IF and assume it is stationary in R’*™ \ I. Allard 
in [2] derived his famous monotonicity formula at the boundary under the additional 
assumption that the density of V has a uniform positive lower bound. His proof 
consists of two steps: he first derives a suitable representation for the first variation 
5V of V along general vector fields of R’”*”, i.e., vector fields which might be 
nonzero on J’. He then follows the derivation of the interior monotonicity formula, 
i.e., he tests the first variation along suitable radial vector fields. His proof needs the 
lower density assumption in the first part and although the latter can be removed (cf. 
Allard, W.K., Personal communication), the resulting argument is rather laborious. 

We introduce here varifolds which are stationary along “tangent fields”: 


Definition 2 Consider an m-dimensional varifold V in an open set U C Rt” and 
let I be a k-dimensional C! submanifold of U. We say that V is stationary with 
respect to vector fields tangent to I” if 


6V(x) =0 for all x € Cc} (U, R™*") which are tangent to I. (2) 


Clearly, when k = m—1, the condition above is stronger than that used by Allard 
in [2], where x is assumed to vanish on I”. On the other hand our condition is the 
natural one satisfied by classical minimal surfaces with boundary I’, since the one- 
parameter family of isotopies generated by x maps I” onto itself. When k > m — 1, 
the condition is the one satisfied by classical “free-boundary” minimal surfaces, 
namely minimal surfaces with boundary contained in J” and meeting it orthogonally. 
In the context of varifolds, the latter have been considered by Griiter and Jost in [22], 
where the two authors derived also an analog of Allard’s monotonicity formula. 
In this section we show how one can take advantage of a suitable distortion of 
the Euclidean balls to give a (rather elementary) unified approach to monotonicity 
formulae in both contexts. 


Definition 3 Assume that 0 € I. We say that the function d : R”*™”" — R is 
a distortion of the distance function adapted to I” if the following two conditions 
hold: 


(a) d is of class C? on R™*” \ {0} and D/d(x) = D/|x| + O(|x|'~/+%) for some 
fixed a € (0, 1] and for 7 = 0, 1, 2; 
(b) Vd is tangent to I’. 


The following lemma is a simple consequence of the Tubular Neighborhood 
Theorem and it is left to the reader. 
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Lemma 1 /f I is of class C? then there is a distortion of the distance function 
adapted to I" where the exponent a of Definition 3(a) can be taken to be 1. 


The main point of our discussion is then the argument given below for the 
following 


Theorem 4 Consider I and V as in Definition 2, assume that 0 € T° and that d is 
a distorted distance function adapted to I’. Let g € Cc} ([0, 1)) be a nonincreasing 
function which is constant in a neighborhood of the origin. If a is the exponent of 
Definition 3(a), then there are positive constants C and p such that the following 
inequality holds for every positive s < p 


d Cs® _— d(x) 
o[ecesm f o(22) aivics 
> = eCetsemt fg (42) 
S 


(where P, denotes the orthogonal projection on the subspace T). 


2 
) dV(x,7) (3) 


( Vd(x) 
*\Vvd@)l 


Note that if we let g converge to the indicator function of the interval [0, 1) we 
easily conclude that 


51> G65) sx oor HV Md <5) 
gm 
is monotone nondecreasing: indeed, foro > s > r > O, the difference ®(s) — 
@(r) controls the integral of a suitable nonnegative expression involving d and the 
projection of Vd/|Vd| over 7+. When d(x) = |x|, namely when I is flat, the 
exponential weight disappears (i.e., the constant C might be taken to be 0), the 
inequality becomes an equality and (in the limit of g ¢ 10,1)) we recover Allard’s 
identity 
na : aa o / | Px Perl d\|V I(x). 
OmS Om? ,0)\B,(0) x |"* 
In particular, since d is asymptotic to |x|, all the conclusions which are usually 
derived from Allard’s theorem (existence of the density and its upper semicontinuity, 
conicity of the tangent varifolds, Federer’s reduction argument and Almgren’s 
stratification) can be derived from Theorem 4 as well. Moreover, the argument given 
below can be easily extended to cover the more general situation of varifolds with 
mean curvature satisfying a suitable integrability condition. 


Proof (of Theorem 4) Consider the vector field 


d Vd 
eae ( ~) acer 
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X, is obviously C! on R'*” \ {0} and moreover we have 


d\[Vd@Vd | dD*d Vd , 
b= ~ 2d D?d-Vd 
(| ivae* Ware “Iwar ) 
ig (2) 4¥4ava 
ONS) S Ware 


From the above formula, using that g is constant in a neighborhood of the origin 
and Definition 3(a), we easily infer that (for every fixed s) 


d(x) 


DXs(x)=9 () Id + O(|x|*). 


In particular X, is C', compactly supported in U (provided s is sufficiently small), 
and tangent to J”. Thus 


0 = 8V(X;) = [ tive xs(ydVin.). 


Fix next an orthonormal basis e), ..., @m of z and use Definition 3(a) to compute 


m y) 
; d d\ d Vd - e; 
div, Xs = 5 e/ - DX -e, = (m+ O(s"))p (£) +¢ (5) . » “a 
i=1 i 


S 
a. (ad _(d\d Vd \| 
=(m+ O(s »o(<)+0'(S)5 1-|P, (25) . 


Plugging the latter identity in the first variation condition we achieve the following 
inequality for a sufficiently large constant C: 


d d d d 
/ (-me (~) -¢! (~) “) dllVII(x) + Cas® / (2) avi 
, (d(x)\ d(x) Vd(x) \|? 
z-f¢ (S) 3 Pa (Tae) 


dV(x,7). 
Multiplying both sides of the inequality by e*"s~”"—! we then conclude (3). 
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Optimal Transport with Discrete Mean M®) 
Field Interaction peels 


Jiakun Liu and Grégoire Loeper 


Abstract In this note, we summarise some regularity results recently obtained for 
an optimal transport problem where the matter transported is either accelerated by 
an external force field, or self-interacting, at a given intermediate time. 


1 Background 


This note is a summary of an ongoing work [5]. The motivation comes from a 
previous work by the second author [6], where he studies the motion of a self- 
gravitating matter, classically described by the Euler-Poisson system. Letting p be 
the density of the matter, the gravitational field generated by a continuum of matter 
with density is the gradient of a potential p linked to ¢ by a Poisson coupling. 
The system is thus the following 


no +V-(pv) = 0, 
d;(pv) + V- (pv @ v) = —pVp, (1) 
Ap= p. 


A well known problem in cosmology, named the reconstruction problem, is to 
find a solution to (1) satisfying 


Plr—0 = Po, Pli=T = pr- 
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In [6], the reconstruction problem was formulated into a minimisation problem, 
minimising the action of the Lagrangian which is a convex functional. Through 
this variational formulation, the reconstruction problem becomes very similar to 
the time continuous formulation of the optimal transportation problem of Benamou 
and Brenier [1], and the existence, uniqueness of the minimiser was obtained by 
use of the Monge-Kantorovich duality. In the context of optimal transport as in [6], 
there holds v = V@ for some potential ¢, and the author obtained partial regularity 
results for @ and p, as well as the consistency of the minimiser with the solution of 
the Euler-Poisson system. 

The optimal transport problem of [6] was formulated as finding minimisers of 
the action of the Lagrangian 


1 T 
(0, v, p) = > i p(t, lve, oP + |V ple, x) Pade, (2) 
2 0 Td 
over all p, p, v satisfying 


xp + V - (pv) =0, 
P(0)= po, p(T) = pr, 
Ap = p, 


where T@ denotes the d-dimensional torus, as the study in [6] was performed in the 
space-periodic case. 

In the work [5] we address the more general problem of finding minimisers for 
the action 


1 T 
I(p, v, p) = > [et.nlue. oP + Fee, ddr, (3) 


for more general ¥. The problem (2) falls in this class. In [3], Lee and McCann 
address the case where 


F(py= = f pt.x0ve.xde. 


(Note that in the context of classical mechanics .¥ would be the potential energy.) 
This Lagrangian corresponds to the case of a continuum of matter evolving in an 
external force field given by VV(t, x). We call this the non-interacting case for 
obvious reasons. This can be recast as a classic optimal transport problem, where 
the cost functional is given by 


T 


1 
ae ae slvr — Vit, y@)dt. 4 
ay oaitine | FLA | ( v¢ ») (4) 


yeC!([0,7],R4) 
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For a small V satisfying some structure condition, they obtain that c satisfies the 
conditions found in [8] to ensure the regularity of the optimal map. 


2 Time Discretisation 


In [5] we restrict ourselves to the case where the force field only acts at a single 
discrete time between 0 and T: 


V(t, x) = d=7/2V (x). 


We will call this case the “discrete” case. The minimisation problem therefore 
becomes 


T 
rip. => | i: p(t. xilu(t,)Paxde+ [ p(T/2,x)Q(w)dx, (5) 
2 0 Rd Rd 


for some potential Q. This will allow to remove the smallness condition on V. 
Moreover, we will be able to extend our result to the mean-field case, where the 
force field is given by 


VV) = / pl, WVEE =e (6) 


This corresponds to the case where a particle located at x attracts or repels 
another particle located at y with a force equal to V«(x — y). We will give a 
sufficient condition on « to ensure a smooth transport map and intermediate density. 
Especially, we consider the gravitational case, which corresponds to the Coulomb 
kernel 


Cd 


a ee ET 


that corresponds to the potential energy 


1 
8) =- FO) =-5 / Ree eased 


1 2 
=-= Vell’. 
= five 
where Ap = p. 


One sees straight away that between time 0 and 7/2 we are solving the usual 
optimal transport problem in its “Benamou-Brenier” formulation [1], as well as 
between 7/2 and T. More generally, as done in [6], one can consider multiple-steps 
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time discretisation, where the potential energy term contributes only at time 


iT 


Between two time steps, the problem will be an optimal transport problem as in 
[1, 2] and [9]. Then at each time step, the gravitational effect will be taken into 
account, and the velocity will be discontinuous. From a Lagrangian point of view, 
the velocity of each particle will therefore be a piecewise constant function with 
respect to time. Then letting the time step go to 0, one will eventually recover the 
time continuous problem. 


3 Main Results 


Let us consider a two-step time discretisation in the interval [0, 7]: Att = 7/2, the 
velocity is changed by an amount equal to VQ, the gradient of a potential Q. The 
initial density po is supported on a bounded domain $29 C R¢, and the final density 
pr is supported on a bounded domain 27 C R4, satisfying the balance condition 


/ ee / pr) di (7) 
920 LT 


As is always the case in solving problems of the form (3), the velocity v is the 
gradient of a potential, and we let ¢ be the velocity potential at time 0, i.e. v(0, x) = 
Vo(x). At time t = T/2, v will be changed into v + VQ and one can see that for 
an initial point x € {2o, the final point y = m(x) € {7 is given by 


ig T 
m(x) =x +TV$ +>VO (: + 56). 


By computing the determinant of the Jacobian Dm and noting that m pushes 
forward po to pr, one can derive the equation for @. To be specific, define a modified 
potential 


: T me 
P(x) = gerd +E 5 , for x € S82. (8) 


It is readily seen [1, 2, 9] that the modified potential ¢ is a convex function. Since 
m#(00 = pr, we obtain that ¢ satisfies a Monge-Ampére type equation 


: ee | 1 
d D? — D? V = (==) al ; 9 
( = #) detD20(V¢)/ prom a 
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where Q is a modified potential given by 


~ T 
O@) = 5A +z, (10) 
with an associated natural boundary condition 
m(2) = 7. (11) 


For regularity of the solution ¢ to the boundary value problem (9) and (11) 
(equivalently that of @), it is necessary to impose certain conditions on the potential 
energy function QO (equivalently on Q) and the domains (20, 27. In [5] we assume 
that Q satisfies the following conditions: 


(H0) The function OQ is smooth enough, say at least C 
(H1) The function Q is uniformly convex, namely D?O > eo! for some €9 > 0, 
(H2) The function Q satisfies that for all £, 7 € R¢ with Ln, 


> (D irs O — 26?4D}.,OD;,,0) OO" FE ninj < —dolEl? In, 
i,j,k, p.qsrs 
(12) 


where {Q“/} is the inverse of {Q; j}, and do is a positive constant. When dp = 0, 
we call it (H2w), a weak version of (H2). 


Note that conditions (HO) and (H1) imply that the inverse matrix (D? Q)7! exists, 
and ensure that Eq. (9) well defined. Condition (H2) is an analogue of the Ma- 
Trudinger-Wang condition [8] in optimal transportation, which is necessary for 
regularity results. We also use the notion of Q-convexity of domains as in [8]. 

Our first main result is the following 


Theorem 1 Let ¢ be the velocity potential in the reconstruction problem. Assume 
the gravitational function O satisfies conditions (HO), (HI) and (H2), 2r is 
Q-convex with respect to 29. Assume that pr > co for some positive constant co, 
po € L?(S29) for some p > a and the balance condition (7) is satisfied. Then, 
the velocity potential ¢ is Cc! (20) for some a € (0, 1). 

If furthermore, 29, 27 are C* smooth and uniformly Q-convex with respect 
to each other, pp € C*(Qo), pr € C?(Rr), then @ € C?(Qo), and higher 
regularity follows from the theory of linear elliptic equations. In particular, if 
O, 20, Qr, po. pr are C™, then the velocity potential ¢ € C® (Qo). 


The proof of Theorem | is done by linking the time discretisation problem to 
a transport problem, where the key observation is that the cost function c(x, y) 
is given by O*(x + y), where O* is the Legendre transform of the gravitational 
function Q. Under this formulation, the regularity then follows from the established 
theory of optimal transportation, see for example [4, 7, 8, 10] and references therein. 
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Our second main result is the following: 


Theorem 2 Assume that Q is given by 


1 
Q(x) = >| P(T/2, y)k(x — y)dy, (13) 
27/2 


where Qr/2 = (Id + FV )(2o) is the intermediate domain at t = 5 and that x 
satisfies conditions (HO), (H1) and 


(H2C) foranyé,né€ R¢, x,yELrp, 


» (Di. = y)) ee EE Nin; <0, 


ik, pqs 


where {k'} is the inverse of {ij + rad 


We also assume some geometric conditions on the domains. Then the results of 


Theorem I remain true. 


The proof of Theorem 2 relies on the observation that (H2c) implies (H2), and is 


preserved under convex combinations, and therefore by convolution with the density 
p(T /2), and on some a priori C! estimates on the potential. Full details and further 
remarks are contained in our work [5]. 
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A Sixth Order Curvature Flow of Plane M®) 
Curves with Boundary Conditions pros 


James McCoy, Glen Wheeler, and Yuhan Wu 


Abstract We show that small energy curves under a particular sixth order curvature 
flow with generalised Neumann boundary conditions between parallel lines con- 
verge exponentially in the C® topology in infinite time to straight line segments. 


1 Introduction 


Higher order geometric evolution problems have received increasing attention in 
the last few years. Particular geometric fourth order equations occur in physical 
problems and enjoy some interesting applications in mathematics. We mention in 
particular for curves the curve diffusion flow and L?-gradient flow of the elastic 
energy, and for surfaces the surface diffusion and Willmore flows. Flows of higher 
even order than four have been less thoroughly investigated, but motivation for them 
and their elliptic counterparts comes for example from computer design, where 
higher order equations are desirable as they allow more flexibility in terms of 
prescribing boundary conditions [8]. Such equations have also found applications 
in medical imaging [10]. 

In this article we are interested in curves y meeting two parallel lines with 
Neumann (together with other) boundary conditions evolving under the L* gradient 


flow for the energy 
/ keds. 
y 
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Here ks denotes the first derivative of curvature with respect to the arc length 
parameter s. Particularly relevant to us is the corresponding consideration of the 
curve diffusion and elastic flow in this setting in [12]. Other relevant works on fourth 
order flow of curves with boundary conditions are [1, 2, 7]. Of course if one instead 
considers closed curves without boundary evolving by higher order equations, these 
have been more thoroughly studied; we mention in particular [3—5, 9, 11]. 

The remainder of this article is organised as follows. In Sect. 2 we describe 
the set-up of our problem, the normal variation of the energy and the boundary 
conditions. We define our corresponding gradient flow, discuss local existence and 
give the relevant evolution equations of various geometric quantities. We also state 
our main theorem in this part, Theorem 2.2. In Sect.3 we state the relevant tools 
from analysis to be used including an interpolation inequality valid in our setting. 
Under the small energy condition (7) below, we show that the winding number of 
curves under our flow is constant and remains equal to zero. We show further that 
under this condition the length of the curve does not increase and the curvature and 
all curvature derivatives in L* are bounded under the flow. That these bounds are 
independent of time implies solutions exist for all time. In Sect.4 we show under 
a smaller energy assumption that in fact the L? norm of the second derivative of 
curvature decays exponentially under the flow. As a corollary we obtain uniform 
pointwise exponential decay of curvature and all curvature derivatives to zero. A 
stability argument shows that the solution converges to a unique horizontal line 
segment. The exponential convergence of the flow speed allows us to describe the 
bounded region in which the solution remains under the flow. 


2 The Set-Up 


Let yo : [-1,1] ~ R? be a (suitably) smooth embedded (or immersed) regular 
curve. Denote by ds the arc length element and k the (scalar) curvature. We consider 
the energy functional 


1 2 
Elyl=3 | Ras 


where k; is the derivative of curvature with respect to arc length. We are interested 
in the L gradient flow for curves of small initial energy with Neumann boundary 
conditions. 

Under the normal variation y = y + € Fv a straightforward calculation yields 


1 
=— / (Ks + k?kss — <k K) F ds 
2 A 
e=0 a 


+2[ke Fas + kesFs + (hsss+378,)F], 
ba 


where k,4 = Kssss. 
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‘Natural boundary conditions’ for the corresponding L-gradient flow would 
ensure that the above boundary term is equal to zero. However, this term is rather 
complicated. In view of the first term in (1), we wish to take 


1 
F=kathkss — 5k Ke (2) 
and the corresponding gradient flow 
0 
PY = Fy. (3) 
ot 


Differentiating the Neumann boundary condition (see also [12, Lemma 2.5] for 
example) implies 


1 
0 = —F, (£1, 1) = —k,s — kkskss — WP kss5 + ak (4) 


As in previous work, we will assume the ‘no curvature flux condition’ at the 
boundary, 


ks (41, t) = 0. (5) 
The boundary terms in (1) then disappear if we choose, for example, 


Ksss (+1, ft) = 0. (6) 


This is in a way a natural choice because Eq. (4) then implies k,s (£1,¢) = 0. In 
fact by an induction argument we have 


Lemma 2.1 With Neumann boundary conditions and also (5) and (6) satisfied, a 
solution to the flow (3) satisfies k,20+1 = 0 on the boundary for € € N. 


Let us now state precisely the flow problem. 

Let n+ (R) denote two parallel vertical lines in R2, with distance between them 
|e|. We consider a family of plane curves y : [—1, 1] x [0, 7) > R? satisfying the 
evolution Eq. (3) with normal speed given by (2), boundary conditions 


y (£1, t) € n+ (R) 


(v, Vn) (+1,t) =0 


ks (£1, t) = ksss (£1, 1) = 0 
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and initial condition 


vy ¢,0) = yo) 


for initial smooth regular curve yo. 


Theorem 2.2 There exists a universal constant C > 0 such that the following 
holds. For the flow problem described above, if the initial curve yo satisfies o = 0 
and 


V5129 — 67 
b= ee m* — IIks|I3 Lo > 0, (7) 


where Lo is the length of yo, then the solution exists for all time T = o© and 
converges exponentially to a horizontal line segment Yoo with dist(Yoo, Yo) < C/6. 


In the above statement and throughout the article we use w to denote the winding 
number, defined here as 


a, 
w:= — | kds. 
20 y 


Remarks 


¢ The condition (7) is not optimal. By a standard argument it can be weakened for 
example to the requirement of Lemma 3.4, namely 


3 
4 
> ~ Wks 3 £9 > 0. 


Details of this argument will appear in a future article. It is an open question 
whether the requirement can be further weakened. 
¢ The exponential decay facilitates an explicit estimate on the distance of Yoo to yo. 


Local existence of a smooth regular curve solution y : [—1, 1] x [0,7) > IR? 
to the flow problem y : [—1,1] x [0,T) > R? is standard. If yo also satisfies 
compatibility conditions, then the solution is smooth on [0, 7). In this article we 
focus on the case of smooth initial yop. However, yo may be much less smooth; 
Eq. (3) is smoothing. We do not pursue this here. 

Similarly as in [12] and elsewhere we may derive the following: 


Lemma 2.3 Under the flow (3) we have the following evolution equations 
(i) 2ds = -kF ds; 

Gi) Fk = Fos + PF; 

(iii) aks = Fss5 + k? F, + 3kksF; 
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(iv) 
Oks = Ket D> (Chruksaraksrks + Coryksaraksreikse 
gtr+asl 


3 4 
HCG py Ksa+2k roku + Cgruksat2k sr+iksutt) 


+ > Cabcdek sak bk sck aK se 
a+b+c+d+e=l 


for constants Ca Cris Cis Gas Cabcde © R with a,b,c,d,e,qg,r,u => 0. 
In particular, 


(v) 


a 21 
apkss = Kost lOksksakss +k kya + 12kk,sks+ L4kk,akss +5kk23 +2k7k,o 


rf 
4: PK kss + 8k kyaks + 5K, + kk — 4kk3. 


3 Controlling the Geometry of the Flow 


We begin with the following standard result for functions of one variable. 


Lemma 3.1 (Poincaré-Sobolev-Wirtinger (PSW) Inequalities) Suppose f : 
[0, L] > R, L > Ois absolutely continuous. 


* If fi fds =0 then 


L 2 L L 
L 2L 
/ fPds< i feds and | fll <= : fods. 
0 4 0 a JO 


¢ Alternatively, if f (0) = f (L) = 0 then 


LG L? L L L 
; frds < = | f2ds and || f \\2, < = | feds. 
0 ’ = 2 0 S ao x Jo Js 


To state the interpolation inequality we will use, we first need to set up some 
notation. For normal tensor fields S and T we denote by S*7 any linear combination 
of S and T. In our setting, S and T will be simply curvature k or its arc length 
derivatives. Denote by P’” (k) any linear combination of terms of type On Kk & Ok & 
2. Ok where m = ij +...+ in is the total number of derivatives. 

The following interpolation inequality for closed curves appears in [3], for our 
setting with boundary we refer to [2]. 
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Proposition 3.2 Let y : I > R* be a smooth curve. Then for any term P'™ (k) with 
n > 2 that contains derivatives of k of order at most ¢ — 1, 


/ [Pr (ds < cL" [ks WRI 5 
F 


where p = + (m + 5n = 1) and c = c (£,m,n). Moreover, ifm +4 < 2¢+1 then 
p <2and for any é > 0, 


7 , 0) = ; = ‘ m+n—1 
|P”" (k)|ds <e ark| ds+ce=r( | kas) ° +c | kas . 
I I I I 


Our first result concerns the winding number of the evolving curve y. In view 
of the Neumann boundary condition, in our setting the winding number must be a 
multiple of 5: 


Lemma 3.3 Under the flow (3), w(t) = w(0). 


Proof We compute using Lemma 2.3 (i) 


d 2 2 
FF kds= | Fysds + | k°Fds— | k°Fds =0, 


sO @ is constant under the flow. oO 
Remarks 


* It follows immediately that the average curvature k satisfies 


Sl 
Es= > [ kds 0 
L Y 


under the flow (3). This is important for applying the inequalities of Lemma 3.1. 

¢ Unlike the situation in [12], here small energy does not automatically imply that 
the winding number is close to zero. Indeed, one may add loops (or half-loops) 
of circles that contribute an arbitrarily small amount of the energy L3||k, lee Note 
that such loops must all be similarly oriented, as a change in contribution from 
positive to negative winding will necessitate a quantum of energy (for example 
a figure-8 style configuration with @ = O can not have small energy despite 
comprising essentially only mollified arcs of circles). 


Next we give an estimate on the length of the evolving curve in the case of small 
initial energy. Of course, this result does not require the energy as small as (7). 


Lemma 3.4 Under the flow (3) with w(0) = 0, 


<Lly (t)] < 0. 
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Proof We compute using integration by parts 


d 2 7 212 7L? 2 2 
ely l= — f kFds=— f k,dst+5 J keds <—|1——> Wsll3| f Rigas 


where we have used Lemma 3.1. The result follows by the small energy assumption. 
oO 


Thus under the small energy assumption we have the length of the evolving curve 
bounded above and below: 
lel < L[y] S Lo. 
We are now ready to show that the L?-norm of curvature remains bounded, 
independent of time. 
Proposition 3.5 Under the flow (3) with w(O) = 0, there exists a universal C > 0 


such that 


IKI < (kIB| +c. 
t=0 
Proof Using integration by parts, Lemma 2.3 and the interpolation inequality 
Proposition 3.2 


where we have also used Lemma 3.1 and the length bounds. The result follows. oO 


Moreover, we may show similarly using the evolution equation for k,e that all 
derivatives of curvature are bounded in L? independent of time. 


Proposition 3.6 Under the flow (3) with w(0) = 0, there exists a universal C > 0 
such that, for all £ € N, 


lee = Mela], + ¢ 
t=0 
Pointwise bounds on all derivatives of curvature follow from Lemma 3.1. It 
follows that the solution of the flow remains smooth up to and including the final 
time, from which we may (if T < oo) apply again local existence. This shows that 
the flow exists for all time, that is, T = oo. 
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4 Exponential Convergence 


Using Lemma 2.3 (i) and (v) and integrating by parts to reduce the order of the 
derivatives we obtain 


Lemma 4.1 Under the flow (3), 


d 
— | keds =—2 / ke,ds+4 i kk2,ds— / kek?,ds—8 7 kkssk2,ds—2 / Kk2.ds 


1 1 8 
Wa / kids — ; ; kkk? ds +5 | Peds + : i k°ds. 


Further integration by parts, use of Lemma 3.1 and throwing away some negative 
terms gives 


Corollary 4.2. Under the flow (3) with w(0) = 0, 


d fe OH. og MO ca 2 
# fas |-2+ sa West + —e— Uksts| Wks 


Under the small energy condition (7), the coefficient of | kys iF of Corollary 4.2 
is bounded above by —6é. Using also Lemma 3.1 we obtain 


Corollary 4.3 There exists a 6 > 0 such that, under the flow, 
d 
a Wssll2 < —8 lls 3 


It follows that \|Ks5 13 decays exponentially to zero. 


Proof (Completion of the Proof of Theorem 2.2) Exponential decay of ss ll5 
implies exponential decay of ales ks ll5, lKlloos IlKslloo Via Lemma 3.1. Expo- 
nential decay of [ke | > and | ke | oo then follows by a standard induction argument 
involving integration by parts and the curvature bounds of Propositions 3.5 and 3.6. 
That ||ks I — 0 implies subsequential convergence to straight line segments 
(horizontal, in view of boundary conditions). A stability argument (see [12] for the 
details of a similar argument) gives that in fact the limiting straight line is unique; 
all eigenvalues of the linearised operator 

Lu = u,6 
are negative apart from the first zero eigenvalue, which corresponds precisely to 
vertical translations. By Hale-Raugel’s convergence theorem [6] uniqueness of the 
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limit follows. Although we don’t know the precise height of the limiting straight 
line segment, we can estimate a-priori its distance from the initial curve, since 


t 
<[ \Fldt << (1-0). 
. 5 


‘Oy 

yo. = (0) =|f nde 
0 t 

a] 
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Quasilinear Parabolic and Elliptic ® 
Equations with Singular Potentials od 


Maria Michaela Porzio 


Abstract In this paper we describe the asymptotic behavior of the solutions to 
quasilinear parabolic equations with a Hardy potential. We prove that all the 
solutions have the same asymptotic behavior: they all tend to the solution of 
the original problem which satisfies a zero initial condition. Moreover, we derive 
estimates on the “distance” between the solutions of the evolution problem and 
the solutions of elliptic problems showing that in many cases (as for example the 
autonomous case) these last solutions are “good approximations” of the solutions of 
the original parabolic PDE. 


1 Introduction 


Let us consider the following nonlinear parabolic problem 


ui; — div(a(x, t, Vu)) = a 4+ f(a, t)in Qr =2 x (0,7), 

x 
u(x,t) =0 on a2 x (0,7), (D 
u(x, 0) = uo(x) in 2, 


where 2 C RY (N > 3) is a bounded domain containing the origin and A and T 
are positive constants. 
Here a(x, t, €) : 2 x R+ x RN — RN is a Caratheodory function! satisfying 


a(x,t,)E>alE?, a >0, (2) 


'That is, it is continuous with respect to € for almost every (x,t) € $2, and measurable with 
respect to (x, t) for every € € RY. 
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lax, ,81<BlEl+uG,.91, B>0, wel*(2r), (3) 
(a(x, t, &) — a(x, t, &'))- € - 8) = al& —€'/, (4) 

and the data satisfy (for example) 
ug€L?(2) fe L*(2r). (5) 


The model problem we have in mind is the following 


u; — Au = a + f(x,t)in Qr, 

Xx 
u(x,t) =0 on 02 x (0,7), (6) 
u(x, 0) = ug(x) in 2. 


We recall that if the data f and ug are nonnegative and not both identically zero, 
there exists a dimension dependent constant Ajy such that (6) has no solution for 
2X > Ayn (see [10]). More in details, the constant Ay is the optimal constant (not 
attained) in the Hardy’s inequality 


2 2 
u y) 1 N-2 
An —ydx < |Vul~dx for every u € Hj (2) where Ay = | ——] , 
2 0 N 
2 |x| 2 2 
(7) 


(see [24] and [20]). 
Hence, here, in order to guarantee the existence of solutions, we assume A < Ay in 
the model case (6) and its generalization 


A<aAy. (8) 


in the general case (1). 
The main aim of this paper is the study of the asymptotic behavior of the solutions 
of (1). 

The peculiarity of these problems is the presence of the singular Hardy potential, 
also called in literature “inverse-square” potential. This kind of singular potential 
arises, for example, in the context of combustion theory (see [11, 47] and the 
references therein) and quantum mechanics (see [10, 44, 47] and the references 
therein). 

There is an extensive literature on problems with Hardy potentials both in 
the stationary and evolution cases and it is a difficult task to give a complete 
bibliography. In the elliptic case, more related to our framework are [1, 2, 4, 5, 
15, 35, 41, 49, 50] and [7]. In the parabolic case, a mile stone is the pioneer 
paper [10] which revealing the surprising effects of these singular potentials on the 
solutions stimulated the study of these problems. More connected to our results are 
[3, 6, 19, 25, 26, 43, 45, 47] and [38]. 
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In particular, in [43] it is studied the influence of the regularity of the data f 
and uo on the regularity of the solutions of (1), while in [47] and [38], among other 
results, there is a description of the behavior (in time) of the solutions when f = 0. 

Hence, here we want to complete these results studying what is the asymptotic 
behavior of the solutions when f is not identically zero. 

We point out that the presence of a singular potential term has a strong influence 
not only, as recalled above, on the existence theory, but also on the regularity and 
on the asymptotic behavior, even when the datum f is zero. As a matter of fact, it 
is well known that if 4 = 0 = f/f and the initial datum wo is bounded then also the 
solution of (1) is bounded; moreover, this result remains true in the more general 
case of non zero data f belonging to L’ (0, T; L7(S2)) with r and q satisfying 


1 N 
Seed (9) 
r  2q 


(see [9] and the references therein). 

Surprisingly, the previous L°-regularity fails even in the model case (6) as soon 
as A becomes positive if f and ug are not both identically zero (otherwise u = Ois a 
bounded solution) since every solution (for nonnegative initial data uo) satisfies” 


C e 
u(x,t) > ne for almost every (x,t) € 2’ x [e, T], (10) 
x 


for every € € (0, Ty. 0<T <T and’ CC 2, where the constant C depends 
only on ¢, i Q’ and i, while a is the smallest root of z2 — (N —2)z+A=0. 

Indeed, the singular potential term influences the solutions also when the 
summability coefficients r and q of f do not satisfy (9). As a matter of fact, again the 
regularity of the solutions in presence of the Hardy potential is different from the 
classical semilinear case A = 0 (see [43] if A > O and [14, 16, 27, 31, 33, 34] and 
the references therein if 2 = 0). 

Great changes appear also in the behavior in time of the solutions. As a matter 
of fact, if A = 0 = f(x,f) it is well known that the solutions of (1) become 
immediately bounded also in presence of unbounded initial data up belonging only 
to L’°(§2) (ro = 1) and satisfy the same decay estimates of the heat equation 


for almost every t € (0,7), (11) 


Ilo Il z70 ($2) 
u(t) lz2(2) < e— 


t 270 ert 


The proof of (10) can be easily obtained following the outline of the proof of (2.5) of Theorem 
2.2 in [10]. 
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where 0 = —‘, is a constant depending on the measure of {2 (see [36] and the 
Q|N 
references therein). The previous bound, or more in general estimates of the type 


ho 
Uoll pry (2) 
Moto hos > 0, (12) 


lu@)Ilz-2) S€ 
are often referred as ultracontractive estimates and hold for many different kinds of 
parabolic PDE (degenerate or singular) like, for example, the p-Laplacian equation, 
the fast diffusion equation, the porous medium equation etc. These estimates are 
widely studied because they describe the behavior in time of the solutions and often 
imply also further important properties like, for example, the uniqueness (see for 
example [8, 12, 17, 18, 21-23, 28-30, 36, 37, 40, 42, 46, 48] and the references 
therein). 
Unfortunately, by estimate (10) above it follows that estimate (11) together 
with (12) fail in presence of a Hardy potential term. Anyway, in [38] it is proved 
thatif f =0,A > Oanduo € L? (£2), then there exists a solution that satisfies 


Ilwollz2(@) 


_ NG) 
pert ~~ z 


Ay 
(13) 


lu) Iln2r«a) Se for almost every t € (0,7), 6 


for every y > | satisfying 


1+J1-90 Xr 
y € { 1, —————_— where 6 = ——. 
0 aAyn 


Hence an increasing of regularity appears (depending on the “size” 4 of the singular 

potential), but according with (10), there is not the boundedness of the solutions. 
As said above, aim of this paper is to describe what happens when f is not 

identically zero. 

We will show that under the previous assumptions on the operator a and on the data 

f and uo, there exists only one “good” global solution u of (1). Moreover, if v is the 

global solution of 


v; — div(a(x,t, Vv)) = a + f(x,t) in 2 x (0, +00), 

Xx 
u(x,t) =0 on 82 x (0, +00), (14) 
v(x, 0) = vo(x) in Q, 


l.e., V satisfies the same PDE of u (with the same datum f) but verifies the different 
initial condition v(x, 0) = vp € L?(Q), then the following estimate holds 


Il“o — voll2¢a) 


lu(t) — vOlln2¢@@) $ i for every 1 > 0, (15) 
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where o is a positive constant which depends on A (see formula (26) below). In 
particular, it results 


tim lu) — vOIlz~@) =O. (16) 


Hence, for t large, the initial data do not influence the behavior of the solutions since 
by (16) it follows that all the global solutions tend to the solution which assumes the 
null initial datum. 

We recall that in absence of the singular potential term we can replace the L?- 
norm in the left-hand side of (15) with the L°-norm and, consequently, together 
with (16), the following stronger result holds true 


tim lu) — v@O)Ile=2) = 0. (17) 


(see [39]). Thus, the presence of the Hardy potential provokes again a change in 
the behavior of the solutions since generally the difference of two solutions u and v 
cannot be bounded if 7 > 0. As a matter of fact, it is sufficient to notice that choosing 
f = 0 and v = 0 (which corresponds to the choice vp = 0) the boundedness of u-v 
becomes the boundedness of u which by (10) we know to be false in presence of a 
Hardy potential. 

Moreover, in the autonomous case 


a(x,t,§)=aQx,§) f(x, t) = f(x) 


we prove that all the global solutions of (1) (whatever is the value of the initial 
datum ug) tend to the solution w € Hy ({2) of the associate elliptic problem 


—div(a(x, Vw)) = a + f(x) in 2, 
w(x) =0 on dm. 


Indeed, we estimate also the difference u — v between the global solutions of (1) and 
the global solution v of the different evolution problem (not necessarily of parabolic 


type) 


vu, — div(b(x, t, Vv)) = ia + F(x, t) in 2 x (0, +00), 
x 

v(x, t) =0 on 092 x (0, +00), 

v(x, 0) = vo(x) in £2, 


looking for conditions which guarantee that this difference goes to zero (letting 
t > +00). 
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Finally, we estimate also the difference u — w between a global solution of (1) 
and the solutions w of the stationary problem 


: W . 
—div(b(x, Vw)) = Pv + F(x) in 22, (18) 


w(x) =0 ond”, 


showing that in the non autonomous case, under suitable “proximity” conditions on 
the operators a and b and the data f and F, the global solution of (1) tends to the 
solution w of the stationary problem (18). 

The paper is organized as follows: in next section we give the statements of our 
results in all the details. The proofs can be found in Sect. 4 and make use of some 
“abstract results” proved in [38] and [32] that, for the convenience of the reader, we 
recall in Sect. 3. 


2 Main Results 


Before stating our results, we recall the definitions of solution and global solution 
of (1). 


Definition 1 Assume (2)-(5). A function uin L®(0, T; L?(2))NL? (0, T; Hj (@)) 
is a solution of (1) if it results 


T T 
[ [ (wer + acet, Ynvol ands = | now(s Oras + f / [acts + 1] evar 
0 Ja 2 0 Job |xl? 

(19) 


for every g ¢ W!(0, T; L7(2)) N L7(0, T; Hg (@)) satisfying o(T) = 0. 


We point out that all the integrals in (19) are well defined. As a matter of fact, by (3) 
it follows that a(x,t, Vu) € (L7(@r))% and thanks to Hardy’s inequality (7) it 


results 
T ' T ae 3 Tr ¢ 3 
o JQ |x| 0 Ja |x| 0 Ja |x| 
x 


Ay | Vellzcen IVellr2(a7) - (20) 


We recall that under the assumptions (2)—(5) and (8) there exist solutions of (1) (see 
[43]). Now, to extend the previous notion to that of global solution, we assume 


f €L? (0, +00); L?(Q)) and p € L7,.([0, +00); L7(2)) (21) 


where yu is the function that appears in (3). 
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Definition 2 By a global solution of (1), or (equivalently) of 


uit — div(a(x, t, Vu) = ane + f(x,t) in 2 x (0, +00) 

x 
u(x,t) =0 on 02 x (0, +00), (22) 
u(x, 0) = uo(x) in £2, 


we mean a measurable function u that is a solution of (1) for every T > 0 arbitrarily 
chosen. 


We point out that (21) together with the previous structure assumptions guarantee 
that the integrals in (19) are well defined for every choice of T > 0. Indeed, there 
exists only one global solution of (1). In detail, we have: 


Theorem 1 Assume (2)-(5), (8) and (21). Then there exists only one global 
solution u of (1) belonging to Cioc({0, +00); L?(2)) N L?, ({0, +00); Hj (2)). 
In particular, for every t > 0 it results 


t 
[feet ace, vuyvohaxar + f [u(x, t)p(x, t) — uog(x, 0)]dx = 
07Q Q 


t 
[| [ats +s] oaxan (23) 
oJaLl |x| 


for every g € W,.+([0, +00); L?(2))  L?,.([0, +00); Hd (Q)). 


loc 


As noticed in the introduction, if we change the initial data in (1), all the associated 
global solutions (to these different initial data) have the same asymptotic behavior. 
In detail, let us consider the following problem 


v, — div(a(x, t, Vv)) = oa + f(x,t) in 2 x (0, +00), 
x 
u(x,t) =0 on 02 x (0, +00), (24) 
v(x, 0) = vo(x) in Q. 
We have the following result: 


Theorem 2 Assume (2)-(5), (8) and (21). If vo € L?(Q), then the global solutions 
u and v of, respectively, (1) and (24) belonging to Cjoc({0, +00); L?(2)) N 
Lie (0, +00); A (Q)) satisfy 

luo — vollr2¢a) 


= x 26 
o=(«-+)er (26) 


lu(t) — vOIlp2(@) S foreveryt >0, (25) 


where 
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with cp Poincaré’s constant.> 
In particular, it results 


lim |lu() — v)II12¢@) = 9. (28) 


Remark I Notice that in the particular case f = 0, choosing as initial datum vp = 0 
we obtain that v = 0 is the global solution of (1). With such a choice in (25) it 
follows that 


IIMollr2¢2) 
ot 


lu \In2(@) S for every t > 0. 


In the model case (6) the previous estimate can be found (among other interesting 
results) in [47] with o = j; the first eigenvalue (see also [38]). We recall that decay 
estimates of the solutions in the same Lebesgue space where is the initial datum 
is not a peculiarity of problems with singular potentials since appear also for other 
parabolic problems (see [13, 38, 46] and the references therein). 


An immediate consequence of Theorem 2 is that in the autonomous case 
a(x,t,€)=a(x,6) = f(x, th = f(x) (29) 


all the global solutions of (1), whatever is the value of the initial datum uo, tend 
(letting t + +00) to the solution w of the associate elliptic problem 


WwW 


—div(a(x, Vw)) = wD + f(x) in &, abi 


w(x) =0 on 022. 


In detail, we have: 


Corollary 1 (Autonomous Case) Assume (2)-(5), (8), (21) and (29). Let w be the 
unique solution of (30) in Hy (Q) and u be the global solution of (1) belonging to 
Choc(L0, +00); L?(2)) N L7.,.([0, +00); Hg (@)). Then it results 


loc 


luo — Wl 72:2) 


|u(t) — wiln2~a) S oat foreveryt >0, (31) 
where o is as in (26). 
3Poincaré’s inequality: 
cr | u?dx < , |Vul?dx for every u € Hy (2), (27) 
2 2 


where cp is a constant depending only on N and on the bounded set 92. 
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In particular, it results 
lim ||u(t) — w =0. 32 
a \|u(t) Il_.2(2) (32) 


We show now that it is also possible to estimate the distance between the global 
solution u of (1) and the global solution v of the different parabolic problem 


v, — div(b(x, t, Vv)) = a + F(x, t) in 2 x (0, +00), 

Xx 
v(x,t) =0 on 82 x (0, +00), (33) 
v(x, 0) = v0(x) in 22, 


where b(x, t, €) : 2 x R*+ x RN — RN is a Caratheodory function satisfying 


D(x, t, EE >aolél?, ao > 0~ (34) 

|b(x,t,€)| < BollEl+ mo. O],  Bo>0, bo € Lj, (f0, +00); L*(@)), 
(35) 

(b(x, t,€) — b(x, t, €)) - (& —&') = aolé —€'". (36) 

vu €L7(Q) FF € L?,.((0, +00); L?(2)). (37) 


Theorem 3 Assume (2)-(5), (8), (21) and (34)-(37). Then the global solutions 
u and v of, respectively, (1) and (33) belonging to Cjoc({0, +00); L?(2)) N 
L?. (0, +00); Hd (Q)) satisfy 


loc 
0) Ilo = vollz2¢g) t 
IM) — WON cq) < sag + [ads foreveryr>0, G8) 
for every choice of 
a0 <0 (39) 
where 
g(s) = tf ce s, Vu(x, s)) — a(x, 5s, Vu(x, s))I? + Jit; s) — F(x, oe] dx, 
o-oo J2 cp 
(40) 


with o and cp are as in (26). Moreover, if g € L!((O, +00)) then it results 


t 


A 
lu) — vOIIZ2¢q) S a +f g(s)ds_ for every t > 0, (41) 
2 
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where 
2 ae 
A = [luo — voll22¢g) + / gitdt. 
0 
In particular, we have 
li t) — v(t =0. 42 
a lu) — VO II,2(«Q) = 9 (42) 


Remark 2 The proof of Theorem 3 shows that the structure assumptions (34)-(36) 
on the operator b can be weakened. In particular, it is sufficient to assume that there 
exists a global solution v of (33) in Cigc({0, +00); L?(2))NL?, (0, +00); Hj (2)) 
satisfying 


b(x,t, Vv) € L?,.((0, +00); L?(2)). 


Hence, also problems (33) which are not of parabolic type are allowed. 

Moreover, with slight changes in the proof, it is also possible to choose a 
larger class of data f and F. In particular, an alternative option that can be done 
is L?, ({0, +00); H~!()). 


loc 


Examples of operators satisfying all the assumptions of the previous Theorem (and 
hence for which (42) holds) are 


=i: Vie ion + f(x,t) in @ x (0, +400), 
Xx 
u(x) = 0 on 092 x (0, +00) 
u(x,0) = uo in 2. 
and the model case 
= Ave ia + F(x,t) in @ x (0, +00), 
Xx 
v(x) =0 on 092 x (0, +00) 
v(x, 0) = vo in Q, 


if we assume 
[a(x,t, Vv)—Vv] € L?(2x (0, +00)) [ f(x, t))—F (x, t)] € L?(2x (0, +00)). 


Remark 3 We point out that an admissible choice for the parameter A in Theo- 
rem 3 is 


r7A=0, 
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ie., the case of absence of the singular potential. In this particular but also 
interesting case, the previous result permits to estimate the difference of solutions of 
different evolution problems. Moreover, as noticed in Remark 2, only one of these 
two evolution problems is required to be of parabolic type. 


A consequence of Theorem 3 is the possibility to estimate also the distance 
between global solutions of (1) and solutions of stationary problems, for example of 
elliptic type, without assuming to be in the autonomous case (29). These estimates 
(see Corollary 2 below) show that if the data and the operators of these different PDE 
problems (calculated on the solution of the stationary problem) are “sufficiently 
near’, then the solutions of the evolution problems tend (for every choice of the 
initial data uo) to the stationary solution. 

In detail, let us consider the following stationary problem 


‘ WwW ; 
—div(b(x, Vw)) = Pe + F(x)in 2, (43) 


w(x) =0 ond. 


To stress the assumptions really needed, in what follows we do not assume any 
structure condition on b except that b : 2 x R“ —> RN is a Caratheodory function. 
We have: 


Corollary 2 Assume (2)-(5), (8) and (21). Let F be in L?(Q) andw € Hy (92) be 
such that 


b(x, Vw) € (L7(@))% . (44) 


If w is a solution of (43) and u € Cioc([0, +00); L7(@)) N L?,.(L0, +00); Hg (2)) 


loc 


is the global solution of (1), then the following estimate holds true 


luo — wll, t 
— i; g(s)ds, (45) 
0 


2 
Ne) — wllhacq) S$ —aam 


for every t > O and for every choice of 00 as in (39) where 
1 
= J ce Vw(x)) — a(x, s, Vw(x))? + —|f (x, 5) — Fee] dx. 
— 00 JQ cp 
(46) 


g(s) = 
lon 


Moreover, if g € L!((0, +00)), then it results 


t 


A 
WO = wllia@) S aa +f g(s)ds_ foreveryt > 0, (47) 
vi 
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where 
2 as 
A= [luo wllingq)+ fo (ode. 
0 
In particular, it follows 
li t) — =0. A 
me lu(t) — wilp2(a) = 9 (48) 


Examples of operators satisfying all the assumptions of Corollary 2 (and hence for 
which (48) holds) are 


uy — div(a(x, Vu) = a + f(x,t) in 2 x (0, +00), 
x 
u(x) = 0 on 092 x (0, +00) 
u(x, 0) = uo in 2. 
and 
w 
—Aw =iA—> + F(x) in 2, 
|x|? 
w(x) = 0 ond. 
with 


[o(x,t)- 1] e L°(2Q x O,+00)) Lf (@, 4) — F@)) € L°(@ x (0, +00) 


(49) 
or 
uz — div(ae(x, 1)b(x, Vu)) = wa + f(x,t) in 2 x (0, +00), 
x 
u(x) =0 on 02 x (0, +00) 
u(x, 0) = uo in 2. 
and 
w 
—div(b(x, Vw)) = re + F(x) in Q, 
Xx 
w(x) =0 ond. 


with a(x, t) and the data f and F satisfying (49). 
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3 Preliminary Results 


In this section we state two results that will be essential tools in proving the theorems 
presented above. 


Theorem 4 (Theorem 2.8 in [38]) Let u be in C((0,T); L’(2)) N L®(O, T; 
L"(Q2)) where 0 < r < ro < oo. Suppose also that |2| < +00 ifr 4 ro (no 
assumption are needed on |S2| ifr = ro). If u satisfies 


t2 
i) |u|" (tr) -[ |u|" (t1) +o f uO) IIbr (qy at <0 forevery 0<t) <t <T, 
2 2Q ty 
(50) 
and there exists ug € L'(82) such that 
u(t) |x70¢2) < c2|luollzo(@) for almost every t € (0, 7), (51) 


where c;, i = 1, 2 are real positive numbers, then the following estimate holds true 


lWollzroca 
llw(t)Ilnn(@) < ca—_—— for every 0 <1 <T, (52) 
e 
where 
toa 
r m4 Cl 
ne c2|82|" "0 eo pee 
1 ifr =ro, r 


Proposition 1 (Proposition 3.2 in [32]) Assume T € (to, +00] and let g(t) a 
continuous and non negative function defined in [to, T) verifying 


to t2 
om) +m f oats f g(t)dt 
ty ty 


for every to < t) < t2 < T where M is a positive constant and g is a non negative 
function in L! ({to, T)). Then for every t € (to, T) we get 


loc 


t 
b(t) < b(t)e MO) +f g(s)ds (53) 


1 


Moreover, if T = +00 and g belongs to L! ((to, +00)) there exists t; > to (for 
example t, = 219) such that 


t 
P(t) < Ae t! +f g(s)ds forevery t>t, (54) 
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where 


+00 
A= ow + f g(s)ds . 


10 


In particular, we get that 


lim, 000 = 0. 


4 Proofs of the Results 


4.1 Proof of Theorem 1 


Let T > 0 arbitrarily fixed. The existence of a solution u € L™(0,T; L?(Q)) NM 
L7(0, T: Hy (2)) of (1) can be found in [43]. We point out that since u; belongs to 
L?(0, T; H~1(2)) (thanks to the regularity of u and (20)) it follows that u belongs 
also to C((0, T]; L2(Q2)). Consequently, it results 


T 
/ / {—ug, + a(x, t, Vu) Vo} dxdt + [u(x, T)g(x, T) —upg(x, 0)J dx = 
0 J2 2 


T Uu 
[/ [atts] oaxar, (55) 
o Jal |x| 


for every g € W!!00, T; L?(2)) N L2(0, T; HA), Moreover, u is the unique 
solution of (1) belonging to C((0, T]; L?(2)) N L7(0, T; Hj(@)). As a mat- 
ter of fact, if there exists an other solution v of (1) in C([(0, T]; L*(2)) ‘a 
L7(0, T; Hy (2)), taking as test function u — v in the equation satisfied by u and in 
that satisfied by v and subtracting the results* we deduce (using (4)) 


1 T 
eae / [V(u — v)/? 
2 0/2 
u—v |" 
Af bet ‘ 


By the previous estimate and Hardy’s inequality (7) we obtain 


5 [wary —verP+(a-2) ff |V(u — v)|* <0. 
2/2 


4The use here and below of these test functions can be made rigorous by means of Steklov 
averaging process. 
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from which the uniqueness follows since by assumption it results a — + > 0. 
Hence, for every arbitrarily fixed T > 0 there exists a unique solution of (1) in 
C([0, T]; L2(2)) N L2(0, T; Hq (Q2)) that we denote u). 
To conclude the proof, let us construct now the global solution of (1). For every 
t > 0 let us define u(x,t) = u(x,t) where T is arbitrarily chosen satisfying 
T > t. We notice that by the uniqueness proved above this definition is well posed. 
Moreover, by construction this function satisfies the assertions of the theorem. O 


4.2 Proof of Theorem 2 


Let u and v be as in the statement of Theorem 2. Taking as test function uw — v in (1) 
and in (24) and subtracting the equations obtained in this way, we deduce (using 
assumption (4)) that for every 0 < t) < f2 it results 


1 
>/ [u(x, 2) — v(x, 2)? dx — >/ [w(x, 1) — va, n)P +a / IV —v)? < 
2/2 2JaQ 4 JQ 
to =a 12 
h / a (57) 
t) JQ |x| 
Using again Hardy’s inequality (7), from (57) we deduce 


t2 


/ |V(u—v)/? <0, 
2: 
(58) 


[tues ta) — vex ry ax — f lute. 11) ~ vee. P +00 f 
Q 2 ty 


where we have defined 


Xr 
«=2(a- >) : (59) 


Thanks to Poincaré’s inequality (27) by the previous estimate we get for every 0 < 
ti) <to 


t2 
[te te) — ve. eyP ax = f tudes) - ve, P +e f i lu — vi? <0, 
Q Q 4 JQ 
(60) 


where cj = cpco. Notice that by (60) (choosing f2 = t and t; = 0) it follows also 
that 


u(t) — v@|lt2(@) < luo — vollz2(a) - 


Now the assert follows applying Theorem 4 with r = ro = 2. Oo 
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4.3 Proof of Corollary 1 


The assertion (31) follows by Theorem 2 once noticed that, thanks to the assump- 
tion (29), the solution w € H, A (2) of (30) is also the global solution w(x, t) = 
w(x) € Cjoc({0, +00]; 12(2))NL2, (0, +00); An ({2)) of the following parabolic 
problem 


— div(a(x, Vw)) =4r 2 + f(x) in 2 x (0,+00), 
w(x,t)=0 on 02 x (0, +00), 


w(x, 0) = w(x) in Q. 


4.4 Proof of Theorem 3 


Let u and v be the global solutions in Cjgc({0, +00]; L7(2)) N L?,.([0, +00); 
Hy (2)) of, respectively, (1) and (33). Taking u-v in both the problems (1) and (33) 
and subtracting the results we obtain for every 0 < ft) < tg 


1 2 1 2 
5 [ tutesn) — vee myP ds = 5 f tuts.) = vo ny? + 
2 JQ 2 Ja 


t2 
[ [tac Vu) — b(x,t, Vv) |V@ —v) < 
ty JQ 


to = Ne t 
aff a ff -mu-o, 
t) JQ |x| ty JQ 


which is equivalent to the following estimate 


1 
5 [lees t2) — we, myPas— 5 J tues.) — vb, + 
2 2 Io 


inn ee 
[a(x,t, Vu) — a(x, t, Vv) ]Va-—v) <A’ 
[xP 


[Le- pao ff [b(x,t, Vv) — a(x, t, Vv) ]V(u — v). (61) 
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By assumption (4), Hardy’s inequality (7) and (61) we deduce 


1 2 1 2 
>| [u(x, t2) — v(x, t2)] dx — > f tu(x.n) ~ v6.10) + 
Q 2 Joa 


2 
: - Vv 2 62 
@-2) ffi (u—v)|" < (62) 
t 12 
[I (f= Fou vy+ ff 1002.1, 90) —ats,1, VIVO 0). 
t JQ t, JQ 


We estimate the last two integrals in (62). Let 6 € (0, 1) a constant that we will 
choose below. It results (using Young’s and Poincaré’s inequalities) 


[] nee “ee! | Gwe — ff pep 
—F)u-v)<- vce ff u—v) + If —F\< 
ty JQ 2 tr JQ 20Cocp ty JQ 

Qo fe ; 1 . 

scoff V(u — v)|~ + ie f—-F 

700 : 0 | 2Cocr J, alt | 


where cp is Poincaré’s constant defined in (27) and Co = (a _ 


x) . Moreover, we 
N 


have 


i) t 
[ [een ro -aer yore» s5e ff wave + 
ty JQ 


ef IL |b(x,t, Vv) — a(x, t, Vv)|* 


By the previous estimates we deduce that 


[ texte) — vex. myP ax — f [w(x, t1) — v(x, 1) + 
2 2 


t 
2a —eco ff |V(u — v)|* 
aol f- F/ sae arse = [fet 99 —a68.2, VP, 


which implies (again by Poincaré’s inequality) 


[con - ve mPax— f [u(x, 1) — v(x, n+ 
Q Q 


ft t 
uf y oP = f gos)as (63) 
ty J2 ty 
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where M = 2cp(1 — 6)Co = 2(1 — 8)o (where o is as in (26)) and 


g(s) = cae | -ir.s) — F(x,s)|? + |b(x, s, Vv) = a(x, s, vP | dx. 
0Co Ja Lee 
(64) 


Denoting 09 = (1 — O@)o (ie, 0 = 1 - 2 ) and applying Proposition 1 with 
p(t) = Jola@, t) — v(x, t)|* dx and to = 0, the assertions follow. oO 


4.5 Proof of Corollary 2 


The asserts follow observing that w(x, t) = w(x) is also a global solution in 


Cloc({0, +00]; L?(2)) N L7.,.([0, +00); Hg (@)) 


loc 


of the following evolution problem 


w, — div(b(x, Vw)) = a + F(x) in 2 x (0, +00), 
x 

w(x,t) =0 on 02 x (0, +00), 

w(x, 0) = w(x) in 2. 
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Medet Nursultanov, Julie Rowlett, and David Sher 


Abstract We prove that the existence of corners in a class of planar domain, 
which includes all simply connected polygonal domains and all smoothly bounded 
domains, is a spectral invariant of the Laplacian with both Neumann and Robin 
boundary conditions. The main ingredient in the proof is a locality principle in the 
spirit of Kac’s “principle of not feeling the boundary,” but which holds uniformly 
up to the boundary. Albeit previously known for Dirichlet boundary condition, this 
appears to be new for Robin and Neumann boundary conditions, in the geometric 
generality presented here. For the case of curvilinear polygons, we describe how the 
same arguments using the locality principle are insufficient to obtain the analogous 
result. However, we describe how one may be able to harness powerful microlocal 
methods and combine these with the locality principles demonstrated here to show 
that corners are a spectral invariant; this is current work-in-progress (Nursultanov 
et al., Preprint). 


1 Introduction 


It is well known that “one cannot hear the shape of a drum” [8, 22, 27]. Mathemat- 
ically, this means that there exist bounded planar domains which have the same 
eigenvalues for the Laplacian with Dirichlet boundary condition, in spite of the 
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0 1 2 3 0 1 2 3 
Fig. 1 These two domains were demonstrated by Gordon, Webb, and Wolpert to be isospectral for 
the Laplacian with Dirichlet boundary condition [9]. This image is from Wikipedia Commons 


domains having different shapes. The standard example is shown in Fig. 1. Two 
geometric characteristics of these domains are immediately apparent: 


1. These domains both have corners. 
2. Neither of these domains are convex. 


This naturally leads to the following two open problems: 
Problem 1 Can one hear the shape of a smoothly bounded drum? 
Problem 2 Can one hear the shape of a convex drum? 


The mathematical formulation of these problems are: if two smoothly bounded 
(respectively, convex) domains in the plane are isospectral for the Laplacian with 
Dirichlet boundary condition, then are they the same shape? 

One could dare to conjecture that the answer to Problem | is yes, based on the 
isospectrality result of Zelditch [36]. He proved that if two analytically bounded 
domains both have a bilateral symmetry and are isospectral, then they are in fact the 
same shape. For certain classes of convex polygonal domains including triangles [5, 
11]; parallelograms [15]; and trapezoids [12]; if two such domains are isospectral, 
then they are indeed the same shape. This could lead one to suppose that perhaps 
Problem 2 also has a positive answer. 

Contemplating these questions led the second author and Z. Lu to investigate 
whether smoothly bounded domains can be isospectral to domains with corners. 
In [16], they proved that for the Dirichlet boundary condition, “one can hear the 
corners of a drum” in the sense that a domain with corners cannot be isospectral to 
a smoothly bounded domain. Here we generalize that result to both Neumann and 
Robin boundary conditions. 

The key technical tool in the proof is a locality principle for the Neumann and 
Robin boundary conditions in a general context which includes domains with only 
piecewise smooth boundary. This locality principle may be of independent interest, 
because it not only generalizes Kac’s “principle of not feeling the boundary” [13] 
but also unlike that principle, it holds uniformly up to the boundary. First, we explain 
Kac’s locality principle. Let 2 be a bounded domain in R*, or more generally 
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Fig. 2 Above, we have the polygonal domain 2 which contains the triangular domain, S29. Letting 
S = S$, be a circular sector of opening angle y and infinite radius, this is an example of an “exact 
geometric match,” in the sense that (29 is equal to a piece of $ 


IR”, because the argument works in the same way in all dimensions. Assume 
the Dirichlet boundary condition, and let the corresponding heat kernel for {2 be 
denoted by H, while the heat kernel for R”, 


K(t,z,z7)= (Aart) 7/20 A274, (1) 


Let 
6 = min{d(z, 02), d(z', 9Q)}. 
Then, there are constants A, B > 0 such that 
[K(t, z,2’) — H(t, 2, 2)| < Ate BO 


This means that the heat kernel for 2 is O(t%)! close to the Euclidean heat 
kernel, as long as we consider points z, z’ which are at a positive distance from 
the boundary. Hence the heat kernel “does not feel the boundary.” 

In a similar spirit, a more general locality principle is known to be true. The 
idea is that one has a collection of sets which are “exact geometric matches” to 
certain pieces of the domain, §2. To describe the meaning of an “exact geometric 
match,” consider a piece of the first quadrant near the origin in R*. A sufficiently 
small piece is an exact match for a piece of a rectangle near a corner. Similarly, 
for a surface with exact conical singularities, near a singularity of opening angle 
y, a piece of an infinite cone with the same opening angle is an exact geometric 
match to a piece of the surface near that singularity. For a planar example, see 
Fig. 2. The locality principle states that if one takes the heat kernels for those “exact 
geometric matches,” and restricts them to the corresponding pieces of the domain 
(or manifold), §2, then those “model heat kernels” are equal to the heat kernel for 
2, restricted to the corresponding pieces of §2, with error O(t™) ast | 0. 


By O(t®), we mean O(t’) for any N EN. 
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This locality principle is incredibly useful, because if one has exact geometric 
matches for which one can explicitly compute the heat kernel, then one can use 
these to compute the short time asymptotic expansion of the heat trace. Moreover, 
in addition to being able to compute the heat trace expansion, one can also use this 
locality principle to compute the zeta regularized determinant of the Laplacian as in 
[1]. 

Here, we shall give one application of the locality principle: “how to hear the 
corners of a drum.” 


Theorem 1 Let 2 C R? be a simply connected, bounded, Lipschitz planar domain 
with piecewise smooth boundary. Moreover, assume that the (finitely many) points 
at which the boundary is not smooth are exact corners; that is, there exists a 
neighborhood of each corner in which the boundary of {2 is the union of two straight 
line segments. Assume that for at least one such corner, the interior angle is not 
equal to 1. 

Then the Laplacian with either Dirichlet,2, Neumann, or Robin boundary condi- 
tion is not isospectral to the Laplacian with the same boundary condition® on any 


smoothly bounded domain. 


To prove the result, we use a locality principle which is stated and proved in 
Sect. 2. We next introduce model heat kernels as well as the corresponding Green’s 
functions for the “exact geometric matches” in Sect. 3. We proceed there to use the 
models together with our locality principle to compute the short time asymptotic 
expansion of the heat trace. Theorem | is then a consequence of comparing the 
heat trace expansions in the presence and lack of corners. In conclusion, we explain 
in how the locality principle fails to prove Theorem | for the case of curvilinear 
polygonal domains, in which the corners are not exact. An example of a non- 
exact corner of interior angle 2/2 is the corner where the straight edge meets the 
curved edge in a half-circle. This motivates the discussion in Sect. 4 concerning the 
necessity and utility of microlocal analysis, in particular, the construction of the 
heat kernel for curvilinear polygonal domains and surfaces via the heat space and 
heat calculus in those settings. This construction, together with a generalization of 
Theorem | to all boundary conditions (including discontinuous, mixed boundary 
conditions), as well as to surfaces with both conical singularities and edges, is 
currently in preparation and shall be presented in forthcoming work [23]. 


2This result was proven in the Dirichlet case in [16]. 


3In particular, in the case of Robin boundary conditions, we assume the same Robin parameters 
for both domains. 
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2 The Locality Principle 


We begin by setting notations and sign conventions and recalling fundamental 
concepts. 


2.1 Geometric and Analytic Preliminaries 


To state the locality principle, we make the notion of an “exact geometric match” 
precise. Let 2 be a domain, possibly infinite, contained in R”. 


Definition 1 Assume that 29 C 2 Cc R"”, and S C R"”. We say that S and Q 
are exact geometric matches on S2o if there exists a sub-domain §2, C 2 which 
compactly contains S29 and which is isometric to a sub-domain of S$ (which, abusing 
notation, we also call §2,). Recall that {29 being compactly contained in (2, means 
that the distance from 29 to 2 \ Q, is positive. A planar example is depicted in 
Fig. 2. 


Next, we recall the heat kernel in this context. The heat kernel, H, is the Schwartz 
kernel of the fundamental solution of the heat equation. It is therefore defined on 
92 x 2 x [0, co), and satisfies 

H(t,z,z')= H(t,z',z), (8+ A)H(t,z, 2’) =0 fort > 0, 


H(0, z,z’) = 6(z—z’), _ in the distributional sense. 


Throughout we use the sign convention for the Laplacian, A, on R”, that 
n 
a 2 
A=-) 09; 
j=l 


We consider two boundary conditions: 


(N) | the Neumann boundary condition, which requires the normal derivative of the 
function to vanish on the boundary; 

(R) the Robin boundary condition, which requires the function, u, to satisfy the 
following equation on the boundary: 


a) 0 
au + p— = 0, - is the outward pointing normal derivative. (2) 


For uo € Y*(Q), the heat equation with initial data given by uo is then solved 
by 


u2)= f H(t, z, z)uo(z’)dz’. 
2 
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Moreover, if 2 is a bounded domain, and {@x}x,>1 is an orthonormal basis for 
YL*(Q) consisting of eigenfunctions of the Laplacian satisfying the appropriate 
boundary condition, with corresponding eigenvalues {Ax },>1, then the heat kernel 


H(t,2,2) =) eo be (2) bx(2). 


k>1 


2.2 Locality Principle for Dirichlet Boundary Condition 


In the general context of domains in R” which have only piecewise smooth 
boundary, the key point is that the locality principle should hold up to the boundary. 
This differs from many previous presentations of a locality principle. For example, 
in [14, Theorem 1.1], it is proved that without any condition on the regularity of the 
boundary, for any choice of self-adjoint extension of the Laplacian on 2 C R”, the 
heat kernel for this self adjoint extension of the Laplacian on 2, denoted by H® 
satisfies 


exp (—2@+ee@'»” 
2 / 0 U /\—-n P 4t 
|H (t,z,z) —H (t,z,Z)| < (Capz, Zz’) + Cp) - n+l 1 
pir i-2 
Above, H® is the heat kernel for R”, p(z) = dist(z,d2), p(z.z’) = 


min(p(z), p(z’)). The constants Cy, and Cy, can also be calculated explicitly 
according to [14]. Clearly, the estimate loses its utility as one approaches the 
boundary. 

In the case of smoothly bounded domains, there is a result of Ltick and Schick 
[17, Theorem 2.26], which implies the locality principle for both the Dirichlet and 
Neumann boundary conditions, and which holds all the way up to the boundary. We 
recall that result.* 


Theorem 2 (Liick and Schick) Let N be a Riemannian manifold possibly with 
boundary which is of bounded geometry. Let V C N be a closed subset which 
carries the structure of a Riemannian manifold of the same dimension as N 
such that the inclusion of V into N is a smooth map respecting the Riemannian 
metrics. For fixed p > 0, let A[V] and A[N] be the Laplacians on p-forms 
on V and N, considered as unbounded operators with either absolute boundary 
conditions or with relative boundary conditions (see Definition 2.2 of [17]). Let 


4In the original statement of their result, Liick and Schick make the parenthetical remark “We 
make no assumptions about the boundaries of N and V and how they intersect.” This could easily 
be misunderstood. If one carefully reads the proof, it is implicit that the boundaries are smooth. 
The arguments break down if the boundaries have singularities, such as corners. For this reason, 
we have omitted the parenthetical remark from the statement of the theorem. 
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A[V]}ket4IV] (x, y) and A[N ket 414] (x, y) be the corresponding smooth integral 
kernels. Let k be a non-negative integer. 

Then there is a monotone decreasing function Cx(K) : (0, 00) > (0, 00) which 
depends only on the geometry of N (but not on V, x, y, t) and a constant C2 
depending only on the dimension of N such that for all K > Oandx,y € V 
with dy (x) := d(x,N\V) > K,dy(y) = K andallt > 0: 


dy (x)? +dy (y)2 +d (x,y)? 
Cot 


ALVe*4M x, y) = ALN TKeM4I Tex, yy] = Ck(Ke ( 


One may therefore compare the heat kernels for the Laplacian acting on 
functions, noting (see p. 362 of [28]) that relative boundary conditions are Dirichlet 
boundary conditions, and absolute boundary conditions are Neumann boundary 
conditions. We present this as a corollary to Liick and Schick’s theorem. 


Corollary 1 Assume that S is an exact match for 29 C 82, for two smoothly 
bounded domains, 82 and {29 in R". Assume the same boundary condition, either 
Dirichlet or Neumann, for the Euclidean Laplacian on both domains. Then 


H® (t,z, 2’) — H(t, z, z)| = Ot) ast 10, uniformly for z, z' € Qo. 


Proof We use the theorem of Ltick and Schick twice, once with N = £2 and once 
with N = S, with V = 92. in both cases. We set k = 0 and 


K =a =d(M,S \ 2¢). 


By the definition of an exact geometric match, a > 0. In the N = S case, the 
theorem reads 


Idist(z,S\@e)[2 _ |dist(z’,S\2e)I2 _ 202 
Cor Cz < Co(aye Cpr 


|W, z,z’)— H(t, z, z’)| < Colade 
We conclude that 
[AS (t,z,2/) — H(t, z, z’)| = O@%) 


uniformly on (29. The same statement holds with S replaced by 2, and then the 
triangle inequality completes the proof. Oo 


The assumption of smooth boundary is quite restrictive, and the proof in [17] 
relies heavily on this assumption. To the best of our knowledge, the first locality 
result which holds all the way up to the boundary and includes domains which have 
only piecewise smooth boundary, but may have corners, was demonstrated by van 
den Berg and Srisatkunarajah [29]. We note that this result is not stated in the precise 
form below in [29], but upon careful reading, it is straightforward to verify that this 
result is indeed proven in [29] and is used in several calculations therein. 
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Theorem 3 (van den Berg and Srisatkunarajah) Let 2 C R? be a polygonal 
domain. Let H® denote the heat kernel for the Laplacian on Q with the Dirichlet 
boundary condition. Then, for S = S,, a sector of opening angle y, and for any 
corner of Q with opening angle y, there is a neighborhood of the corner 4, such 
that 


|H% (t,z,2') — H*r(t,z,2)|} = O@~), uniformly V(z,2') € Ny x N, 


Above, H%y denotes the heat kernel for Sy with the Dirichlet boundary condition. 
Moreover, for any Ne © 2 which is at a positive distance to all corners of £2, 


|He(t, z,2)— HE, z,z)|= Ot), uniformly V(z,z') €- Nex NM. 


Above, H®+ denotes the heat kernel for a half space with the Dirichlet boundary 
condition. 


The proof uses probabilistic methods. We are currently unaware of a general- 
ization to domains with corners in higher dimensions. However, it is reasonable 
to expect that such a generalization holds. Since Theorem | has already been 
demonstrated for the Dirichlet boundary condition in [16], we are interested in the 
Neumann and Robin boundary conditions. For this reason, we shall give a proof 
of a locality principle for both Neumann and Robin boundary conditions which 
holds in all dimensions, for domains with piecewise smooth boundary (in fact, only 
piecewise @* boundary is required), as long as we have a suitable estimate on the 
second fundamental form on the boundary. Moreover, our locality principle, similar 
to that of [29], allows one to compare the heat kernels all the way up to the boundary. 
For this reason, the locality principles demonstrated below may be of independent 
interest. 


2.3 Locality Principle for Neumann Boundary Condition 


Here we prove a locality principle for the Neumann boundary condition for domains 
in R” with piecewise @* boundary satisfying some relatively general geometric 
assumptions. Since we consider both bounded and unbounded domains, we require 
a uniform version of an interior cone condition: 


Definition 2 Let « > 0 and h > 0. We say that a domain 2 C R” satisfies the 
(€, h)-cone condition if, for every x € 082, there exists a ball B(x, 5) centered at 
x of radius 6, and a direction €,, such that for all y € B(x, 5) M &, the cone with 
vertex y directed by &; of opening angle ¢ and height A is contained in 92. 


Definition 3 Let > 0 andh > 0. We say that a domain §2 C R” satisfies the two- 
sided (¢, h)-cone condition if both £2 and R” \ @ satisfy the (e, 4)-cone condition. 
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Theorem 4 (Locality Principle for Neumann Boundary Condition) Let 2, 920, 
and § be domains in R" such that S and Q are exact geometric matches on {2o, 
as in Definition I. Assume that both 92 and S satisfy the two-sided (&, h)-cone 
condition for some ¢ > 0 andh > 0. Let H® denote the heat kernel associated to 
the Laplacian on S2, and let H® denote the heat kernel on S, with the same boundary 
condition for 0S as taken on 082. Moreover, assume that there exists 0 € R such 
that the second fundamental form 1 > o holds on all the @? pieces of AQ and AS. 
Then 


H(t, z,z)y— H(t, z,z)| = Ot) ast | O, uniformly for z, Zz’ € Mo. 


Proof We use a patchwork parametrix construction, as discussed in section 3.2 of 
[1]. This is a general technique to construct heat kernels whenever one has exact 
geometric matches for each part of a domain; see for example [1], [18] and [26]. 

Let {xija1 be a @* partition of unity on 2. Assume that x; € @°() is 
identically 1 on a small neighborhood of the support of x; and vanishes outside a 
slightly larger neighborhood. In particular, we choose x; to be identically equal to 
one on S29. Choose x, to be identically one on a strictly larger neighborhood and 
to have its support equal to (2,. We assume that the support of x2 does not intersect 
(29. We then define the patchwork heat kernel 


2 
Chez) >) GOH © a2 xe). 
j=l 


We claim that uniformly for all z, z’ € 2o, 
|H? (t,z,2')-G@,z,z)|=O™), £10. 


That is, we claim that the patchwork heat kernel is equal to the true heat kernel 
with an error that is O(t%) for small time. This claim immediately implies our 
result, since on 29, x; = 1, and x; = 1, whereas x2 and x2 both vanish, and thus 
G(t, z, 2’) = H%(t, z, 2’). 

To prove the claim, we follow the usual template. Observe that 


2 
E(t, 2,2’) = (8 + A)G(t, 2,2) = DTA, KOA, 2, 2) xj@). 
j=l 


Each commutator [A, x;(z)] is a first-order differential operator with support a 
positive distance from the support of x;. Thus E(t, z, z’) is a sum of model heat 
kernels and their first derivatives, cut off so that their spatial arguments are a positive 
distance from the diagonal. We claim each such term is O(t™). To obtain this 
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estimate, we use [30, Theorem 1.1], which gives the estimate 


C. 
[VHP (t,z,2')| < CTI) exp (- 


for some constants Cy,Cg > 0, for D = 92 and D = S. The setting there is 
not identical, so we note the places in the proof where minor modifications are 
required. First, the assumption that (2 is compact is used there to obtain estimates 
for all t > 0. In particular, the discreteness of the spectrum is used to obtain long 
time estimates by exploiting the first positive eigenvalue in (2.1) of [30]. Since 
we are only interested in t | 0, this long time estimate is not required. Next, 
compactness is used to be able to estimate the volume of balls, | B(x, /1)| > C2, 
for a uniform constant C,. However, we have this estimate due to the two-sided 
(e, h)-cone condition which is satisfied for both 2 and S which are contained in 
IR”. Moreover, we have verified (Wang, private communication) that the assumption 
of piecewise * boundary (rather than @” boundary) is sufficient for the proof of 
[30, Theorem 1.1], as well as the references used therein: [31-33].° 

Since the domains S and £2 satisfy the two-sided (¢, /)-cone condition, there are 
Gaussian upper bounds for the corresponding Neumann heat kernels. Specifically, 
as aresult of [4, Theorems 6.1, 4.4], for any T > 0, there exist C}, C2 > 0 such that 


_ z-2/ (2 


_k 
(Q,z2)<Cit te & , jH%¢,2,2)| Ss Cit te OF (3) 


on (0, 7] x S x S and (0, T] x 82 x & respectively. The upshot is that each term in 
the sum defining E(t, z, z’) is uniformly O(t°) for all z and z’ in @, and therefore 


|E(t,z,z’)| = Ot). 


From here, the error may be iterated away using the usual Neumann series 
argument, as in [19] or Section 4 of [26]. Letting * denote the operation of 
convolution in time and composition in space, define 


K:=E-ExE+Ex ExE-.... 


It is an exercise in induction to see that K (t, z, z’) is well-defined and also O(t™) 
as t goes to zero, see for example the proof of parts a) and b) of Lemma 13 of [26]. 
Note that 22 is compact, which is key. Then the difference of the true heat kernel 
and the patchwork heat kernel is 


H® (t,z,z) -— GG, z, 7) = -(G * KJ, z, 2). 


5We have also verified in private communication with F. Y. Wang that the arguments in [30-33] 
apply equally well under the curvature assumption I > —o for piecewise @7 boundary. 
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As in Lemma 14 of [26], this can also be bounded in straightforward fashion by 
O(t™), which completes the proof. oO 


The key ingredients in this patchwork construction are: (1) the model heat kernels 
satisfy off-diagonal decay estimates, and (2) the gradients of these model heat 
kernels satisfy similar estimates. The argument can therefore be replicated in any 
situation where all models satisfy those estimates. Here is one generalization: 


Corollary 2 Using the notation of Theorem 4, suppose that 82 is compact and that 
the heat kernels on both Q and S satisfy off-diagonal bounds of the following form: 
if A and B are any two sets with d(A, B) > 0, then uniformly for z € A and z’ € B, 
we have 


|H(t,z,z’))+|VA(t, z, z’)| = O°) ast > 0. (4) 


Then the conclusion of Theorem 4 holds. 


Proof Apply the same method, with a partition of unity on §2 consisting of just two 
components, one cutoff function for S29 where we use the model heat kernel H s : 
and one cutoff function for the rest of @ where we use H®. The result follows. 


Remark I The bounds (4) are satisfied, for example, by Neumann heat kernels on 
compact, convex domains with no smoothness assumptions on the boundary [30], 
as well as by both Dirichlet and Neumann heat kernels on sectors, half-spaces, and 
Euclidean space. 


2.4 Locality for Robin Boundary Condition 


In this section, we determine when locality results similar to those of Theorem 4 
hold for the Robin problem. The answer is that in many cases they may be deduced 
from locality of the Neumann heat kernels. We consider a generalization of the 
classical Robin boundary condition (2) 


7] 
an + c(x)u(x) = 0, x € the smooth pieces of 0D. (5) 
n 


In the first version of the locality principle, to simplify the proof, we shall 
assume that 22 C 8S, and that 92 is bounded. We note, however, that both of 
these assumptions can be removed in the corollary to the theorem. The statement 
of the theorem may appear somewhat technical, so we explain the geometric 
interpretations of the assumptions. Conditions (1) and (2) below are clear; they are 
required to apply our references [30] and [4]. Items (3), (4), and (5) mean that the 
(possibly unbounded) domain, D = S (and as in the corollary, in which §2 may be 
unbounded, D = §2) has boundary which does not oscillate too wildly or “bunch 
up” and become space-filling. These assumptions are immediately satisfied when 
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the domains are bounded or if the boundary consists of finitely many straight pieces 
(like a sector in R?, for example). 


Theorem 5 (Locality Principle for Robin Boundary Condition) Assume that 2 
and S are exact geometric matches on S20, as in Definition 1, with 29 C 2 CSC 
IR”. Assume that Q is bounded. Let K*(t, x, y) and K® (t, x, y) be the heat kernels 
for the Robin Laplacian with boundary condition (5) for D = S and D = Q, 
respectively, for the same c(x) € 2°(AS UA). Let a := dist(Qo, S \ 2), and 
note that a > 0 by our assumption of an exact geometric match. Define the auxiliary 
domain 


= {x € 2: d(x, 2) < a/2}. 


We make the following geometric assumptions: 


1. Both S and 22 satisfy the two-sided (€, h)-cone condition; 

2. Both S and 22 have piecewise C3 boundaries, and there exists a constant o € R 
such that the second fundamental form satisfies 1 > —o on all the @° pieces of 
both 0S and dQ. 

3. For any sufficiently small r > 0 and any t > 0, we have 


Lv x 2: 
uw [ / a Blade ee. (6) 
bes 0S\B(x,r) e 


4. Forallr > Oandallx € R", and both D = S and D = Q, there is a constant 
Cp such that 


H"—" (9D 0 (B(x, r)) < CpVoln—1(Bn—-1 (4,7), (7) 


where H2"—' denotes the n — 1 dimensional Hausdorff measure; 
5. If Gn(x, y) is the free Green’s function on R", we have 


sup | Gn(x, yo (dy) < w. (8) 
xeW JaQ 


Then, uniformly on 20 x Qo, we have Robin locality: 
|KS(t,z,2')- K@(t,z,2)| = O0%), — Vz,2' €Q, 10. 


The assumptions that 2 C S and that 2 is bounded can both be removed: 


Corollary 3 Suppose we have an exact geometric match between $2 and S on the 
bounded domain (29, and the Robin coefficient c(x) agrees on a common open, 
bounded neighborhood QQ, of 829 in 2 and S. Then, as long as Theorem 5 holds 
for the pairs (829, 2) and (S20, S), the conclusion of Theorem 5 holds for the pair 
(2, S). 
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Proof Apply Theorem 5 to the pairs (29, §2) and (S29, S), using the same W, then 
use the triangle inequality. oO 


Before we prove Theorem 5, we discuss the geometric assumptions (6), (7), 
and (8), and give some sufficient conditions for them to hold. First, observe that 
regardless of what W is, (6) is immediately valid if S is a bounded domain whose 
boundary has finite n — 1 dimensional Lebesgue measure. It is also valid if S is an 
infinite circular sector, by a direct computation, part of which is presented below. 


Example 1 Let S = Sy Cc R? be a circular sector of opening angle y and infinite 
radius. Assume that W and 2 are bounded domains such that W C 92 C S, and 
assume for simplicity that W contains the corner of S; see Fig.2 (the case where 
this does not happen is similar.) Then (6) holds. Indeed, let r € (0, a@/2) and t > 0, 


then 
=|? 
uw [ i s a(dz)ds 
a aS\B(x, ns 5. 

Lo bec? Ix=<l? 

< sup ff —-e s o(dz)ds + we ff 1 sind 
xeWJ0 Jas\a2 § xeW (ASNIL)\B(x,r) § 
2 
<2 i -e os Fards + [ we f o(dz)ds < ~. 
os (aSNdQ) 


Moreover, recalling the Green’s function in two dimensions (9), we also have 


Qa 
sup | Gr(x, y)o(dy) = sup | In |x —z||o (dz) <[ |Intldt < ~. 
xEW JIAQNB(x,r) xEW JAQNB(x,r) 0 


As for (7), this is automatic if D is a bounded domain with piecewise el 
boundary. It is also true if D is a circular sector (in fact here Cp = 2). 
The condition (8) is also easy to satisfy: 


Proposition 1 Assume that Q is a bounded domain in R" which has piecewise @° 
boundary. Let W Cc 92 be a compact set; then (8) holds. 


Proof Recall that 


|Injx—yl|, ifn =2; 
Gn(x,y) = (9) 


peg, ia 3, 


Since W is compact, it is enough to prove that 
re | Gal. yoy) (10) 
IQ 


is a continuous function on W. 
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Fix x € W. Let e > 0 and {xj}y C W bea sequence such that x; — x. Since 
dQ is piecewise €, and G,(x, y) isin FZ! 


loc? WE Can choose 6 > 0 such that 


/ Gn(x, yla(dy) <e, / Gn(xj, yo (dy) <e, (11) 
IQNB(x,28) IQNB(x,28) 


for sufficiently large j € N, such that for these j we also have |x — x;| < 6. To 
see this, we note that G,(x, y) = Gn(|x — yl) = Ga(r), where r = |x — y|, and 
similarly, Gn(xj, y) = Ga(rj) with r; = |x; — y|. Thus, choosing the radius, 26, 
sufficiently small, since G,, is locally “! (8) integrable, and 0 is piecewise @°, 
we can make the above integrals as small as we like. 

Now, we note that G,(x;, y) > Gn(x, y) as j > o, for y € OQ \ B(x, 28). 
Moreover, since {2 and thus 0{2 are both compact, Gp(x;, vy) < C = C(6) for 
y € 022 \ B(x, 25). The Dominated Convergence Theorem therefore implies 


/ (GG) CAx Wee) =e 
982\ B(x,28) 


for sufficiently large j € N. This, together with (11), implies that the function (10) 
is continuous on W. 


In summary, we have 


Corollary 4 The locality principle, Theorem 5, holds in the case where 82 is a 
bounded domain in R" with piecewise @? boundary, and S is any domain with 
piecewise ©? boundary such that Q and S are an exact geometric match on the 
bounded subdomain Qo as in Definition 1. Moreover, we assume that: 


1. Both S and 22 satisfy the two-sided (€, h)-cone condition; 

2. There exists a constant 0 € R such that the second fundamental form satisfies 
1 > —o on both 0S and 022; 

3. S satisfies (6) and (7); 

4. The Robin coefficient c(x) € (AS UAL) agrees on a common open bounded 
neighborhood Q, in 2 and S. 


Remark 2 In particular, all assumptions are satisfied if §2 is a bounded polygonal 
domain in R2, and S is a circular sector in R?. 


The proof of Theorem 5 is accomplished by proving several estimates, in the 
form of lemmas and propositions below. Since the domains S and {2 satisfy the two- 
sided (€, h)-cone condition, there are Gaussian upper bounds for the corresponding 


How to Hear the Corners of a Drum 251 
Neumann heat kernels as given in (3). With this in mind, define 


F(t) = sup |H5(s, x, 2) — H%s,x, 2) 
(s,x,z)€(0,t]x Wx (SN) 


’ 


Fo(t) := sup |#5(s, x, 2) 
(s,x,z)E(0,t]xWxS\2 


’ 


P3(t) := sup |e? ae, 2) « 
(s,x,z)E(0,t1]x Wxd2\dS 


It now follows from (3) and Theorem 4 that 
F(t) := max(F\(t), Fo(t), F3(t)) = Ot), t > 0. (12) 
The reason we require the Neumann heat kernels is because, as in [24, 35],° 
the Robin heat kernels, K“(t, x, y) and K° (t,x, y), can be expressed in terms of 
H(t, x, y) and H2 (t,x, y) in the following way. Define 


kP(t,x,y) =H? (t,x,y), D=SandD=Q, 


and 
t 
asy= f H?(s, x, ze(zkP_s(t — s, z, ya (dz)ds (13) 
0 aD 


form € N. Then 
CO CO 
PEav= >) Paty. K*Gad= > eae 
m=1 m=1 


Let us define the function 


t t 
A(t, x) =| / [#50,x,2)0)|o(dz)as+ f / |H?(s, x, z)e(z)| o(dz)ds 
0 Jas 0 Jan 


=: Ay(t,x) + Ao(t, x) 
(14) 


on (0, 1] x W. The following lemma, in particular, shows that A(t, x) is a well 
defined function. 


Lemma 1 The function A(t, x) is uniformly bounded on (0, 1] x W. 


6We note that the result is stated for compact domains. However, the construction is purely formal 
and works as long as the series converges. Under our assumptions, we shall prove that it does. 
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Proof Forn = | the lemma follows from (3). Hence, we assume here n > 2. For 
anyx € W,A j(t, x), j = 1, 2, 1s an increasing function with respect to the variable 
t € (0, 1]. Therefore, it is sufficient to prove that Aj;(x) := Aj(1, x) is bounded on 
W, for j = 1,2. 

Let us choose 0 < p < min(a@/2, 1). Without loss of generality, setting Cy = 
C2 = | in (3), we obtain 


A(x) + A2(x) 
1 n Ix—z|2 1 n Iv—z)2 
<[ / gs 2e 5 lotio(azyas + [ i s Ze —s —|c(z)|o(dz)ds 
0 JaS\B(x,p) 0 JaSNB(x.p) 


x= 


1 2 1 2 
_n _ |x=2l a zl 
+f 7 s Ze 5 le@lo(dords+ f / s Ze 5 |c(z)|o (dz)ds 
0 JdS2\ B(x,p) 0 JIRNB(x,p) 


=: Jy (x) + Jo(x) + J3(x) + J4(a). 


The boundedness of Jj (x) on W follows from (6) and the assumption that c(z) € 
L~. For J3(x) we estimate using only that 0 is bounded and thus, since it is 
piecewise @3, has finite measure, 


1 
1 2 
J3(x) < elo ff = eo > a(dz)ds < ©. 
0 s2 92\B(x,p) 


Since p < a/2,0SM B(x, p) = 021 B(x, p) for x € W, and hence by Fubini’s 
theorem and a change of variables 


1 2 
nh _|k=2l 
A(x) = Jax) = / i ste |e(z)|o (dz)ds 
0 JaRnB(x,p) 


1 “+00 a 
< IClloo er =) T2 e dto (dz). 
IQNB(x,p) |X — ZI jx—z|2 


For n > 2, the second integral is uniformly bounded, and hence, (8) implies that 
Jz(x) and J4(x) are bounded on W. If on the other hand n = 2, then 


+00 


Jo(x) = J4(x) = / le(z)| t!e-tdto (dz). 
O2QNB(x,p) |x—z|2 


Since p < 1, p* < p < 1, so we can write 


Jn(x) = Jax) S i 


AQNB(x,p) 


<telo fine 2P]otdz)+ elle fod, 
AQNB(x,p) AQNB(x,p) 


1 +00 
Ic(z)| t!dto(dz) + / Ic(z)| / e‘dta(dz) 
|jx-z|? AQNB(x,p) 1 
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which is finite by (8), the boundedness of 82 and the piecewise ¢* smoothness of 
the boundary. oO 


Corollary 5 In the notation of Lemma 1, we have 


lim sup A(t,x) =0. 
T>0 ,.x)e(0,T]XW 


Proof Consider the functions A ;(t, x). They are monotone increasing in ft for each 
x, and they are continuous in x for each ¢ by continuity of solutions to the heat 
equation. We claim that as t — 0, A(t, x) approaches zero pointwise. To see this 
write the time integral from 0 to ¢ in each A ;(t, x), j = 1, 2, as a time integral over 
[0, 1] by multiplying the integrand by the characteristic function x;0,;]. For example, 


1 
Ate i : Lil GaoeO loads 
0 as 


The integrands are bounded by |H'5(s, x, z)c(z)|, which is integrable by Lemma 1. 
For each x, they converge to zero as t — 0. So by the Dominated Convergence 
Theorem applied to each A j;(t, x), we see that A(t, x) > O ast — 0 for each x. 
Now we have a monotone family of continuous functions converging pointwise 
to a continuous function (zero) on the compact set W. By Dini’s theorem, this 
convergence is in fact uniform, which is precisely what we want. Oo 


To use this, fix a small number A to be chosen later. Then Corollary 5 allows us 
to find T > O such that 


A(t,x) <A, (t,x) € (0, T] x W. (15) 


Next we prove the following two auxiliary propositions. 


Proposition 2 The following inequality holds with D = S and D= 22: 


t 
/ i Ps, x, ze(2) adds 22°" a" (16) 
0 JdD 

on (0,T] x W, for any m € N. Moreover, an identical inequality holds when 

KP (s, x, Z) is replaced by kP (8, Z, X). 


Proof By induction. For m = 0, recalling the definition of A(t, x), (14), 


t t 
/ / ke ss x, z)e(z)|o (dz)ds -|/ / |H?(s, x, z)c(z)|o (dz)ds < A(t, x) < A. 
0 JaD 0 JaD 
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We have thus verified the base case. Now, we assume that (16) holds for k < m. 
Consider k = m+ 1: 


t 
[ I, Pua, x, Ze(2)| a(dz)ds 


t Ss 
= / i / i HP (x, ra KP (s —7,C, ne(ee(2)| o(dt)dtoa (dz)ds. 
0 JaDVO JdD 


Changing variables: 
t AY 
/ / / / 7, ZR zt, ne(s)er)| o(dt)dto(dz)ds 
0 JaDJO aD 
t Ss 
< | i j i, JAP —t2, jk? (c, g, nel )e(2)| 0 (de)dro(dz)ds 
0 aD J0 aD 


t t 
</ / / i HP As —t|,z, Ok (c, g, nel )e(2)| 0 (de )dro(dz)ds 
0 0D JO dD 


t t 
< i i (/ / JH? (Is - T|,Z, s)e(a)|o(az\ds) 
0 JdaD 0 /JaD 


For the integrand, we compute 


w(t, ¢, xe(s)| o(dt)dt. 
(17) 


t 
[I |HP ds = tl. z, )e(2)| o(dz)ds 
0 JaD 
v t 
=e JAP (Is —thz.c)e@)|o(derds + ff |HP ds = tl. z, o)e(2)| o(dz)ds 

0 aD t JaD 

z: t—T 

a JH? x —s,2,8)e(@)| odds + f / |#?(s,2, o)e(2)| o(dz)ds < 24. 

0 aD 0 aD 


Therefore, from the induction hypothesis and (17), we obtain 


t Ss 
i if i / 7? x, z, O)kP(s — 1, ¢, ne(s)eC)| o(dt)dto(dz)ds 
0 oD JO aD 


t 
<24 [| 
0 JaD 


KR(x, b, x)e(b)|o(dg)de < 2A. 2H LAM] = gm42am42, 


as desired. 
The estimates with x and z reversed are proved similarly. Note in particular that 
the base case works because kp — HP is a Neumann heat kernel and is thus 


symmetric in its spatial arguments. oO 
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We need one more lemma concerning pointwise bounds for k?, which uses the 


geometric assumption (7). 


m? 


Lemma 2 Let D = S or 92. There exists Tg > 0 such that for all m, all t < To, all 
x € D,andally € D, 


Ci 2 —Pe [e-y2 
im XY) Sat te 


Proof The proof proceeds by induction. The base case is m = 0, which is (3). 
Now assume we have the result for k = m. Using the iterative formula (13), we 
have 


t 
[kK 10%, 915 ells f ) |H?(s,x, kn (t—s,z,y)|o(dz)ds. (18) 
0 JaD 
Using (3) and the inductive hypothesis, we see that the integrand is bounded by 
Ci? ss aay 7 2? per yi ) 


First assume that D is a half-space. We do the estimate in the case n = 2, because 
the general case is analogous. Hence, we use the coordinates x = (x1,x2), y = 
(y1, 2), Z = (Z1, 22), and estimate using {z2 = O} C R? for dD. Dropping the 
constant factors, and saving the integral with respect to time for later, we therefore 
estimate 


Iz? __ yz? 
if s(t = ror TUF dzy. 
R 
Without loss of generality, we shall assume that x = (0, 0). Then we are estimating 
=<} =s)-sly—2? 
[eo te a, 
R 


Since z € 0D, we have z2 = 0. For the sake of simplicity, set y2 = 0; the case 
where yo is nonzero is similar. Given this assumption, we set 


Z:=Z1, Y= Yi, 
and estimate 


22(t—s)—s(y—2)? 
[s Eh 1 eG dy, 
R 
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We do the standard trick of completing the square in the exponent. This gives 


=F 4 J tz pee ye . y? sy? 
[: (t—s) exp | — VOoJsJt—s = C2(t — s) . Cot (t — s) ae 


We therefore compute the integral over R in the standard way, obtaining 


=ty2+sy2 

s-W2p — gy“? pe oreo = sy — 5-172 [or Cra) 
Ay2 A128 Cor ae 
=5 (t—s) a Cor 


Finally, we compute the integral with respect to s, 


Ite 


1 
— ds = 
[ Js Jt—s 
Hence, the total expression is bounded from above by 


Cot = y 


e OF, 
t 


Since we had assumed that x = 0, we see that this is indeed 


Con _lx-y? 
Ty | ale Cyt 
t 


Recalling the constant factors, we have 


a Com _ ix-y? 
ee gy, = CC illellos2> x |—e ora 


Now we note that the power of ¢ is t—~™-1)/2 for dimension n = 2. Hence, we 
re-write the above estimate as 


lx=yl? 


kP s(t, x, VI < CiC I lelloo2"aVit-| (Came 
We then may choose for example 


1 


“jf = 
= °°" 4(C1 + D2 (llelloo + D223(C2 + 1) 
1 


= 47 <——  —  e 
2(C1 + 1) ({lelloo + 1)12.fCo +1 
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This ensures that 


lx—yl? 
OF. N= 2: 


oe Ces ee Or ia 


We note that in general, for R”, by estimating analogously, noting that the integral 
will be over R”~!, we obtain 


x—y/? 


(KP 4G 3, PS llelle2 “woah Pe Sr 


So, in the general-n case, we let 


1 


ns 
° 4(Ci + DA (ello + D222*(Cz + 1)" 


Then, for all t < 7p, we have 


wyi2 
5P Q-m-1 at — ko 
| mits x, y)| < Ci t ze Qt 


Now consider the case where D is a general domain, not necessarily a half-space. 
As before, we have 


2 
Ix—z/? 4) Ie-yl? 


C2 t oil 
2 (tx, ys AOE ‘ff sg — sy Me OCS VES) g(dz)ds. 
0 JaD 
(19) 


We claim that the right-hand side of (19) is less than or equal to Cp, the constant 
from (7), times the corresponding integral in the case where D is a half-plane 
through x and y. Assuming this claim, we get the same bound as for a half-plane, 
but with an extra Cp, and adjusting 7p to absorb Cp: as well, by putting an extra 
(Cp + 1)? in the denominator, completes the proof. 

To prove this claim, we use the so-called layer cake representation: rewrite the 
right-hand side of (19), without the outside constants, as 


t [o,@) 
/ Cs oo / i Xf (s,t.x,y,z)<aye | dao (dz)ds, (20) 
0 ap Jo 


where naturally 


l /\jx—z? , lze= yl? 
Ft 954) =o ("+e"). 


C2 Ss t-—s 


The representation (20) may seem odd at first but reverts to (19) upon integration 
in a. Switching the order of integration in (20) (valid by Fubini-Tonelli, since 
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everything is positive) and evaluating the z-integral, this becomes 
t [o@) 
/ sgasy a f HO" (ADNz: f(s,t, x,y, 2) <a}je “dads. (21) 
0 0 
Let us more closely examine the set {z : f(s, t, x, y, z) < a}. Itis the set where 
Ss 7 Ss , 1 
(1 - ) |x —z\o + riz —y|° < 7 C2astt —s). 


It is straightforward to compute that this set is in fact a ball centered at the point 
P(s,t,x,y) := (1 — $)x + Fy, with radius squared equal to 


1 
R*(s, t,x, y) = max {0 pone S) - (1 -) ly xP} . 
Therefore (21) equals 
t ee) 
/ as a) KH" (IDO B,(P, R))e~“ dads. (22) 
0 0 
By the assumption (7), this is bounded by 


t [o.e) 
co | sma — syne f Voln—1(Bn—1(P, R))e“ dads. (23) 
0 0 


However, in the event that D is a half-space with x and y € 0D (soalso P € 0D), 
we have ODN B,(P, R) = Bn_i(P, R), so (22) equals 


t [o.@) 
/ gg nr f Voln—1(Bn—1(P, R))e~* dads. (24) 
0 0 


Therefore, the integral (22) for general D is bounded by Cp times the integral (22) 
for a half-space. Since (22) is equal to the right-hand side of (19) without the 
preceding constants, the claim is proven. This completes the proof of Lemma2. O 


Remark 3 The key is that the integral is half an order better in ¢ than the true heat 
kernel, which is a critical feature of the difference between Robin and Neumann 
heat kernels. It allows us to utilize the extra ./f to obtain the additional factor of 
2~™ which is required for the induction step in the next proposition. 


Now, we establish the main estimate to prove Theorem 5. Let 


5 _ (@2)? 
G(t) = max} F(t), 2Cy7 "eb 


We note that of course we still have G(t) = O(t°) ast | 0. 
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Proposition 3. There exists T > 0 such that the estimate 
Ik (t,x, 9) — ke (t,x, y)| < GQ) 7-2 (25) 


m 


holds for all (t,x, y) € (0, T] x W x Qo. 


Proof We choose T small enough so that T < To in Proposition 3 and so that (15) 
holds with A = 1/4. 

Now proceed by induction. The base case is instantaneous by definition of kg and 
of F(t), using our locality principle for the Neumann case. So assume that (25) holds 
for k = m; we will prove it for k = m + 1. Using some algebraic manipulations, 

T=lkuiGhs)—miGsishth+h 


t 
=f | HS(, x, Dkn(t — 8, 2.9) — H%(s, x, DERE 5,2, 9) le(z)|o (dz)ds 
0 Jasnaa 
t 
+ ff 1a8e.x. 20 —s,2, »eC)lo(deyds 
0 Jas\a2 
t 
= [ / |? (s, x, Zk (t — 8, Z, ye()lo (dz)ds. 
0 Jaa\as 
We estimate these terms separately, beginning with 1. 


t 
neff | #50, x, 2) — HP (5, x, 2) KS =, z, »)lle@lo(dz)ds 
0 JaSNIL 


t 
0 JWnasSndse 


t 
+f f see 5.2.9) —k2(t 8,2, y)||H9(, x, 2)] le@lo(dz)ds. 
0 J(Q\W)nasnae 


k(t —s,z,y) -k2@(t—s,z, v)| lee” (sim z)| Ic(z)lo (dz)ds 


The first term in the first integral is bounded by F(t), since x € W andz € 2, so 
we may pull it out. We estimate the other term with Proposition 2 and get a bound 
of F(t) - 2+! a+! — F(t) -2-“*+ for the first integral. 

For the second integral, we pull out the supremum of the first term using the 
inductive hypothesis. We estimate the other term using the definition of A and we 
get a bound of G(t)-7-27”"~?. 

For the third integral, we use Lemma 2 to pull out the first term, ignoring the 
difference and just estimating both k terms separately. Since |z — y| > a@/2 on this 
region, the supremum is less than 27” G(t) by Lemma 2. We estimate the other term 
using the definition of A and we get 1/4, giving a bound of 2~”~7G(t). Overall, we 
have 


h<GQ(2" 147.2742"), 
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Next we estimate the terms J, /3. In each, we pull out the supremum of 
H(s,x, z) over the relevant region, and observe that it is bounded above by F(t). 
For the term remaining in the integral we use Proposition 2. Since F(t) < G(t), we 
obtain a bound of G(r) - 2~’”"—! for each of these two terms. Putting it all together, 
we see 


7 | 
I < G(t)(3-2-"—-!47.2-"-24.2-") = G(t)-2-"—! (3+ 7 ;) = G(t):7-27"7 1, 


as desired. oO 


Proof Finally, we prove Theorem 5. By Proposition 3, 


CO 
|K°(t,x,y)-K°@,x, y)1 <> kx, 9) — ke Gx, yD 


m=0 


< ))7G@)2-" = 14- Gin, 


m=0 


which is O(t%) as t > 0. Oo 


3 Hearing the Corners of a Drum 


As a consequence of the work in the previous section, the locality principle holds for 
both the Neumann and Robin boundary conditions when {2 is a bounded domain as 
described in Theorem |, and S is either a whole space, a half-space, a circular sector, 
or a smoothly bounded domain which is an exact geometric match for some piece 
of 2. Therefore, to compute the heat trace expansion for 2 C R? satisfying the 
hypotheses of Theorem |, it suffices to chop the domain into pieces and, depending 
on the piece, replace the true heat kernel with one of the following: 


e the heat kernel for an infinite circular sector with the same opening angle and 
boundary conditions near a corner of (2, 

e the heat kernel for a smoothly bounded domain which is an exact match to {2 
away from all the corners. Note that such a domain can be produced by rounding 
off each corner. 


Henceforth we consider the Neumann boundary condition or the classical Robin 
boundary condition as in (2), so that in (5), c(x) is a constant, specifically 
c(x) = a/B. In [23] we prove that the “corner contribution” for the Robin 
boundary condition is identical to that for the Neumann boundary condition. In the 
aforementioned work, we determine the Green’s function for a circular sector of 
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infinite radius and opening angle y, with the Neumann boundary condition in polar 
coordinates: 


1 [o@) 
Gy(s,r,¢.170, Po) = =| Kin (rs) Kip (rovs) X {coshex — |¢o - &))u 


sinh zu sinh(z — y) 


cosh(p + go — ym + cosh(@ — oneldn, (26) 


sinh y sinh y 
Above, K;,, is the modified Bessel function of the second kind, and s is the spectral 
parameter of the resolvent, (A + s)—!. The derivation of these formulas stems from 
Fedosov’s study of Kontorovich-Lebedev transforms [7] and shall be presented in 
our forthcoming work [23]. Using functional calculus techniques, as we do in [23], 
one may rigorously justify the statement that 


H(t,r,¢,10, 60) = Z| (Gls, 7, , r0, Go) (t), 


where H denotes the heat kernel and “7! denotes the inverse Laplace transform 
taken with respect to s. This allows us to pass from the Green’s functions to the heat 
kernels on a sector, and we may then compute the short time asymptotic expansions 
of the heat traces using our locality principles. 


3.1 Heat Trace Calculations 


Let 22 be a domain with corners as described in Theorem |. Assume that §2 has n 
corners. Let .4% be a neighborhood of the i!” corner consisting of a circular sector 
of radius R, with R sufficiently small so that each 4 can be taken to equal to 2g in 
the definition of exact geometric match corresponding to S;, where S; is the infinite 
circular sector of interior angle equal to the angle 0;. Then let U be a smoothly 
bounded domain such that U can be taken equal to S in the definition of exact 
geometric match, with 29 = @ \ {.%}?_,. By our locality principles, the heat trace 


n 
i HO C,2,2)d2= [ HY Gz.ode+ > | H*(t, z,z)dz + O(t™). 
2 QIN iat OM 


(27) 


The calculation of the asymptotics of the integral of HY (t, z, z) is well-known 
and may be extracted from [21], [25] and [35]. More interesting is the calculation 
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of the heat trace near the corners. Let us define: 


jes [ Rise tiws coher eyo ada, 
oe =f Kies —— H cosh(d +o — yd 


h 
Cs - Kin V9 Kiulton/ si a — bo)udy, 


With this terminology, the Neumann Green’s function for an infinite sector is 
given by 


1 
= (A+B+C). 
IU 


We shall compute the heat trace contributions from each of these terms. In each 
calculation, we will take the inverse Laplace transform, restrict to the angular 
diagonal ¢ = ¢9 (which commutes with # =1y, integrate in @ (same), restrict to 
r = ro, and integrate in r. 


3.1.1 Heat Trace Contribution from the A Term 


Setting @ = ¢o, we have by Gradshteyn and Ryzhik [10, 6.794.1] 


/ Kix r-V5) Kix (ro) cosharx)dx = = Ko — 709). 
0 


Then, by Erdelyi et al. [6, 5.16.35], we have 


11 _o-n? 
[A] = 27! [F Kove = ros) | ei a 
2 ee 
Hence for ¢ = ¢o, 


rr 2 
ee a (28) 
a? Ant - 


Setting r = ro gives (4rt)7!, and integrating over .%, the contribution from this 
term to the heat trace is the usual area term: 


A(%) 
Ant ~ 
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3.1.2 Heat Trace Contribution from the B Term 


Now we investigate the contribution from B. The first simplification is to restrict to 
& = do, then compute 


Y ¥ sinh 2x 
i Hg o= : Kix (r V5) Kix (10 V5) a cosh(2g — y)xd¢. 
0 0 sinh yx 


The only dependence on the angle is in the cosh term, which may be explicitly 
integrated, and we obtain 


sin nhrx 


2 
[ Bioie= [ PON GY es dx = loro) Kolrv9). 


where in the last equality we have used [10, 6.794.10]. Now take the inverse Laplace 
transform: 


= y m- I ro 
a4 al Bipot] = 2 = [Ero Kory) = > 34° a lo(=) 
0 t 
(29) 
Thus, we see that 
1 y - +9 orro 
—2¢ / Bl g=ed¢ | = I(— 30 
= i wd] = 7 ze (>). (30) 
To compute the trace, we make a change of variables, by setting 
- 
u=—, du=-dr 
2t 
Therefore, 
2 
1 ek 2 1 (oe 
— el /2 Jp = rdr= ;/ e “Ip(u)du. 
At 0 2t 4 0 
By Watson [34, p. 79 (3)] with v = 1, 
uli (u) + 11 (u) = ulo(u). (31) 


By Watson [34, p. 79 (4)] with v = 0, 


ul} (u) =ul\(u). (32) 
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We use these to compute 


a (e-“uCo(u) + 11 (u))) = e™ (—ulo(u) — uly (u) + Io) + Ty (u) + ug) + ul, (w)) 
u 


=e (—wIy(u) + Inu) +ulj(u)) (by (31) 
=e "Ip(u) (by (32)). 


Next, define 
g(u) = e “uUo(u) + Nu), (33) 
and note that we have computed 
gu) =e“ Tou). 


We therefore have 


R2/2t 
/ e"Io(udu = (g(R?/21) — ¢(0)). 
0 


Since J9(0) = 1 and (0) = 0 [34], it follows that g(0) = 0, and we therefore 
compute that 


R2/2t on. Re 
/ e“Ip(u)du = g(R*/2t) =e *® (NS Uo R*/21) + 1 (R?/2t)). 
0 


For large arguments, the Bessel functions admit the following asymptotic expan- 
sions (see [34]) 


peje (i= (7 aro oe =k ‘. #204 
w= 2nx 2x J 4 Loi », x>0, j=), 1. 


Consequently, for x = R*/2r, 


R? _ 2 R? 2 1 
2 = —R?/21 2 2 = _ 
Gr age NE EG (xe) : (cea) 
_ R 


Te t | 0. 
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Recalling the factor of 1 we see that the trace of B contributes 


R 
4./mt 


Observe that this is precisely the usual perimeter term: 


+ O0(Vt), tO. (34) 


&MNAIR) 


Wr + O(V'1). 


3.1.3. Heat Trace Contribution from the C Term 


Next, we compute the trace of the C term. This is done following [29]. The cosh 
term drops out when ¢@ = @o. Integrating with respect to the angle gives a factor of 
y. We define 


R(t) = —Z7! (4 [ coud OL [ KR(rvordr) 
m~ Jo sinh(y x) R 


It is shown in [29] that 
R(t) = O01"), 


and in fact an estimate is also obtained there for the constant c > 0. Hence, it suffices 
to compute 


m2 sinh yx 


ws (4 [ae i KRlrviprdr) 
0 0 


Here we use [10, 6.521.3]. As in that notation we have a = s = b, we must compute 
instead the limit of the expression as b —> a, 


m(ab) “(a® +b”) f(a) — fb) 


ogni aay oo eo" 
Then, since 
f= vt! 
we have 
mw(ab)~*(a® +b’) f(a) — f(b) _ ma" (2a") v-l TV 


boa 2sin(vr)\(a+b) a—b  — 4sin(va)a ~ 2sin(v)a2" 
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Inserting our parameters, we have that 


UX 


_ K? (rd/s)rdr — 
0 


2 sinh(zx)s° 


So we must compute 


g y [ sinh(z — y)x UX 
0 


za : ; dx 
m2 sinhyx 2s sinh(zrx) 


This calculation has been done in [29, p. 122] using [10]; we have independently 
verified these calculations as well. The result is given in [29, (2.10)]: 


qi y2 
ny ~ 


Thus, we see that C contributes to the trace the usual “corner contribution”: 


gee 


4 oo 
“Any + O(t?). (35) 


3.2 Robin Boundary Condition 


The Robin heat kernel has an additional contribution from the boundary. In [2, 
(3.19)], and more classically [3, $14.2], the Robin heat kernel for a half-space is 
computed, and it is equal to the Neumann heat kernel plus one additional term. This 
term is 


1 xx!) a(y+y’) 1 / 
E=- = ae Be B erfe (22 4 Sv), 
Art B V4t B 


3.2.1 Trace of the E Term 


The restriction of E to the diagonal yields 
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We have to integrate this over the semicircle x* + y* < R7. Doing the x-integration 
first yields 


so we obtain 


er pR/ Vt 
Sper ereuslt/B/R2 — 4242 erfc(u + o Jijdu. 
Jn B 0 B 


We shall use integration by parts, noting that 


d eo 
rE zerfce(z) — Wr = erfc(z). 


So, 


R/Vt 
/ e2auvi/B / R2 — t2u? erfc(u + ava 
0 


‘us — e2avitu/B / R2 — a ( + aaa erfc(u + av/t/B) — 


eo wtayi/py? 7 R/V 
Wis 


u=0 


R/V 
-| (24 _ pote] e2aviulB./R2 — 72y2 erfe(u + avt/B)du. 
0 _ Uu 


It is a straightforward exercise to prove that 


R/Jt 
i e2aviu/B erfc(u + av/t/B)du 
0 


is uniformly bounded as t | 0. Hence the second term is O(./1) ast | 0, for we can 
pull out a factor of ./f and keep every other term in the integrand bounded above 
by aconstant. However, ast | 0, the first term converges to Ra~!/*. Hence, the E 
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term gives a contribution of 


aa Fpl N92). 


This is the usual perimeter term in the Robin setting [35]. 

The Robin heat trace asymptotics also have a contribution from the corners. 
Though we cannot calculate it explicitly here, we do so using geometric microlocal 
analysis in [23]. We obtain that the “corner contribution” from an interior angle y 
is the same as in the Neumann case, 


ny? 
Any — 


3.3 Heat Trace Expansions and Proof of Theorem 1 


It is now straightforward to use (27) to compute the heat trace asymptotics for (2 
by combining our explicit computations of the integrals over .% with the known 
asymptotics [21, 35] for the integrals over 92 \ {-%}. In addition to the “corner 
contribution” from the parametrix at the corner, there is also a contribution from the 
model used for the smooth parts of the domain, as turning the corner contributes to 
the curvature: 


na—O 
127 


, for an interior angle 6;. 


We therefore obtain: 


(N) for the Neumann boundary condition, 


+ OW), 


tre 


“ia, IQ, (OB), x2) 1 Soe 


Ant | 8/at 6 12 247 6; 


k=1 


(R) for the Robin boundary condition, 


2 a2 Q dQ "m2 +62 
tret4 ~ Il, | L 4 x )_ | ey k 
4nt 8./nt 6 2nB 12 


Now let §2 be a smoothly bounded domain in the plane. The heat trace expansions 
have been computed by McKean and Singer [21] for the Neumann boundary 
condition and [35] for the Robin condition. These are, respectively, 
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(N) for the Neumann boundary condition, 


2 |2| Ja) x (2) 
ne ee a), 
re ae + + O(/1) 


(R) for the Robin boundary condition, 


Q a2 Q a2 
a 121 | [got OW. 


t —— 
i Ant 8./nt 6 2nB 


Proof (Theorem 1) If two domains are isospectral, then they have the same heat 
trace. Hence, for each power of ¢ in such an expansion, the coefficient must be 
identical for both domains. Now, let us assume that {2 satisfies the assumptions 
in Theorem |, so it has at least one corner of interior angle not equal to z. Let 
the interior angles at the corners be {6,};_,. Let us assume that 92 is a smoothly 
bounded domain, and that the we have taken the same boundary condition for the 
Laplacian for both Q and 2. Assume for the sake of contradiction that Q and 2 
are isospectral. Therefore, their heat trace coefficients coincide. Hence, they have 
the same area and perimeter. Since the same boundary condition is taken for both 
domains, and thus the same values of a and 6 in the Robin case, we should have 


X(2Q) on Qym*+6? — x(@) 
nae aS a 36 
6 12 2S 24st Oy 6 (20) 


We have assumed that §2 is simply connected, but we make no such assumption on 
2. Hence 


x(2Q)=1, .x(2) S11. 


Following the argument on p. 91-92 of [16], 


n 


x(2) on m+ 6? 
6 12 +) 24s Ox 


1 
SS 
6= 


which violates (36). oO 


4 Microlocal Analysis in the Curvilinear Case 


It turns out that the heat trace expansions above are also valid for curvilinear 
polygons, once terms accounting for the curvature of the boundary away from the 
corners have been included. Although this has been demonstrated in [16] for the 
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Dirichlet boundary condition using monotonicity, it becomes a much more subtle 
matter for the Neumann and Robin boundary conditions. 

The main problem is that for curvilinear polygons, we no longer have an exact 
geometric match. Hence, we can no longer use the locality principle to compute 
the heat trace expansion, because there are no known expressions for the heat 
kernels. For classical polygons, one may compute the Neumann heat trace using 
the Dirichlet heat trace together with the trace of a Euclidean surface with conical 
singularities created by doubling the polygon. However, this technique fails once the 
edges of the polygon are no longer necessarily straight near the corners. Therefore, 
in order to compute the short time asymptotic expansion of the heat trace without 
exact geometric matches, we turn to the robust techniques of geometric microlocal 
analysis. This allows us to give a full description of the Dirichlet, Neumann, and 
Robin heat kernels on a curvilinear polygon in all asymptotic regimes. Restricting 
to the diagonal and integrating yields the heat trace. 

In order to describe the heat kernel in all asymptotic regimes, we build a space, 
called the heat space or double heat space, on which the heat kernel is well-behaved. 
This space is built by blowing up various p-submanifolds of 2 x {2 x [0, 00). To 
see why this is needed, first consider the heat kernel (1) on R”. At the diagonal 
in R” x R” x [0, 00), the heat kernel behaves as O(t~"/*) as t | 0. However, 
as long as d(z, z’) > € > O, the heat kernel behaves as O(t®) as t | 0. So the 
heat kernel fails to be well-behaved at {z = z’,t = O}. This is the motivation for 
“blowing up” the diagonal {z = z’} at t = 0, which means replacing this diagonal 
with its inward pointing spherical normal bundle, corresponding to the introduction 
of “polar coordinates”. The precise meaning of “blowing up” is explained in [20], 
and in this particular case of blowing up {z = z’} att = 0 in R” x R" x [0, oo), see 
[20, Chapter 7]. 

For the case of a curvilinear polygonal domain 2 C R?, we begin with Q x 
92 x [0, oo) and perform a sequence of blow-ups. Our construction is inspired by 
the construction of the heat kernel on manifolds with wedge singularities performed 
by Mazzeo and Vertman in [19]. We leave the details to our forthcoming work [23]. 

Once the double heat space has been constructed, the heat kernel may be built 
in a similar spirit to the Duhamel’s principle construction of the Robin heat kernel 
in the proof of Theorem 5. We start with a parametrix, or initial guess, and then 
use Duhamel’s principle to iterate away the error. This requires the proof of a 
composition result for operators whose kernels are well-behaved on our double heat 
space, and that in turn requires some fairly involved technical machinery (a proof 
“by hand” without using this machinery would be entirely unreadable). However, it 
works out and gives us a very precise description of the heat kernel on a curvilinear 
polygon, with any combination of Dirichlet, Neumann, and Robin conditions. 
Moreover, we are able to generalize our techniques and results to surfaces which 
have boundary, edges, corners, and conical singularities. 

The details of this sort of geometric microlocal analysis construction are intricate, 
but its utility is undeniable. In settings such as this, where exact geometric matches 
are lacking, but instead, one has asymptotic geometric matches, these microlocal 
techniques may be helpful. For the full story in the case of curvilinear polygons 
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and their heat kernels, please stay tuned for our forthcoming work [23]. We 
have seen here that the heat kernels for circular sectors gives the same angular 
contribution, arising from the so-called “C term” for both Neumann and Robin 
boundary conditions. Moreover, this is the same in the Dirichlet case as well [29]. 
Interestingly, it appears that for mixed boundary conditions, there is a sudden change 
in this corner contribution. We are in the process of obtaining a small collection of 
negative isospectrality results in these general settings in the spirit of Theorem 1, 
including a generalization of Theorem | which removes the hypothesis that the 
corners are exact; see [23] for the full story. 
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Nonparametric Bayesian Volatility ® 
Estimation on 


Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij 


Abstract Given discrete time observations over a fixed time interval, we study 
a nonparametric Bayesian approach to estimation of the volatility coefficient of 
a stochastic differential equation. We postulate a histogram-type prior on the 
volatility with piecewise constant realisations on bins forming a partition of the 
time interval. The values on the bins are assigned an inverse Gamma Markov 
chain (IGMC) prior. Posterior inference is straightforward to implement via Gibbs 
sampling, as the full conditional distributions are available explicitly and turn 
out to be inverse Gamma. We also discuss in detail the hyperparameter selection 
for our method. Our nonparametric Bayesian approach leads to good practical 
results in representative simulation examples. Finally, we apply it on a classical 
data set in change-point analysis: weekly closings of the Dow-Jones industrial 
averages. 
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1 Introduction 


1.1. Problem Formulation 


Consider a one-dimensional stochastic differential equation (SDE) 
dX, = bo(t, X;) dt + so(t)dW,, Xo=x, +t €[0,T], (1) 


where bo is the drift coefficient, so the deterministic dispersion coefficient or 
volatility, and x is a deterministic initial condition. Here W is a standard Brownian 
motion. Assume that standard conditions for existence and uniqueness of a strong 
solution to (1) are satisfied (see, e.g., [47]), and observations 


Bn = {Xton> see Xinn} 


are available, where fj,, = iT /n, i = 0,...,n. Using a nonparametric Bayesian 
approach, our aim is to estimate the volatility function so. In a financial context, 
knowledge of the volatility is of fundamental importance e.g. in pricing financial 
derivatives; see [4] and [52]. However, SDEs have applications far beyond the 
financial context as well, e.g. in physics, biology, life sciences, neuroscience and 
engineering (see [1, 25, 40] and [76]). Note that by It6’s formula, using a simple 
transformation of the state variable, also an SDE of the form 


dX; = bo(t, X;) dt + so(t) fo(Xr) dW;, Xo =x, te [0, T], 


can be reduced to the form (1), provided the function fo is known and regular 
enough; see, e.g., p. 186 in [66]. Some classical examples that fall under our sta- 
tistical framework are the geometric Brownian motion and the Ornstein-Uhlenbeck 
process. Note also that as we allow the drift in (1) to be non-linear, marginal 
distributions of X are not necessarily Gaussian and may thus exhibit heavy tails, 
which is attractive in financial modelling. 

A nonparametric approach guards one against model misspecification and is an 
excellent tool for a preliminary, exploratory data analysis, see, e.g., [65]. Commonly 
acknowledged advantages of a Bayesian approach include automatic uncertainty 
quantification in parameter estimates via Bayesian credible sets, and the fact 
that it is a fundamentally likelihood-based method. In [51] it has been argued 
that a nonparametric Bayesian approach is important for honest representation of 
uncertainties in inferential conclusions. Furthermore, use of a prior allows one to 
easily incorporate the available external, a priori information into the estimation 
procedure, which is not straightforward to achieve with frequentist approaches. For 
instance, this a priori information could be an increasing or decreasing trend in the 
volatility. 
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1.2 Literature Overview 


Literature on nonparametric Bayesian volatility estimation in SDE models is scarce. 
We can list theoretical contributions [34, 36, 54], and the practically oriented paper 
[2]. The model in the former two papers is close to the one considered in the present 
work, but from the methodological point of view different Bayesian priors are used 
and practical usefulness of the corresponding Bayesian approaches is limited. On 
the other hand, the models considered in [54] and [2] are rather different from ours, 
and so are the corresponding Bayesian approaches. The nearest predecessor of the 
model and the method in our paper is the one studied in [37]. In the sequel we 
will explain in what aspects the present contribution differs from that one and what 
the current improvements are. We note in passing that there exists a solid body 
of literature on nonparametric Bayesian estimation of the drift coefficient, see, e.g., 
(35, 55, 57, 63, 70, 71] and the review article [72], but Bayesian volatility estimation 
requires use of substantially different ideas. We also note existence of works dealing 
with parametric Bayesian estimation in discrete-time stochastic volatility models, 
see, e.g., [45] and [46], but again, these are not directly related to the problem we 
study in this paper. 


1.3. Approach and Results 


The main potential difficulties facing a Bayesian approach to inference in SDE 
models from discrete observations are an intractable likelihood and absence of a 
closed form expression for the posterior distribution; see, e.g., [21, 25, 62] and 
[69]. Typically, these difficulties necessitate the use of a data augmentation device 
(see [67]) and some intricate form of a Markov chain Monte Carlo (MCMC) 
sampler (see [61]). In [37], these difficulties are circumvented by intentionally 
setting the drift coefficient to zero, and employing a (conjugate) histogram-type 
prior on the diffusion coefficient, that has piecewise constant realisations on bins 
forming a partition of [0, T]. Specifically, the (squared) volatility is modelled a 
priori as a function s* = x. 618,, with independent and identically distributed 
inverse gamma coefficients 6;’s, and the prior /7 is defined as the law of s*. 
Here B,,..., By are bins forming a partition of [0, 7]. With this independent 
inverse Gamma (IG) prior, 0),...,@y are independent, conditional on the data, 
and of inverse gamma type. Therefore, this approach results in a fast and simple to 
understand and implement Bayesian procedure. A study of its favourable practical 
performance, as well as its theoretical validation was recently undertaken in [37]. 
As shown there under precise regularity conditions, misspecification of the drift is 
asymptotically, as the sample size n — oo, harmless for consistent estimation of 
the volatility coefficient. 
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Despite a good practical performance of the method in [37], there are some 
limitations associated with it too. Thus, the method offers limited possibilities for 
adaptation to the local structure of the volatility coefficient, which may become 
an issue if the volatility has a wildly varying curvature on the time interval 
[0, 7]. A possible fix to this would be to equip the number of bins N forming 
a partition of [0,7] with a prior, and choose the endpoints of bins Bx also 
according to a prior. However, this would force one to go beyond the conjugate 
Bayesian setting as in [37], and posterior inference in practice would require, for 
instance, the use of a reversible jump MCMC algorithm (see [32]). Even in the 
incomparably simpler setting of intensity function estimation for nonhomogeneous 
Poisson processes with histogram-type priors, this is very challenging, as observed 
in [77]. Principal difficulties include designing moves between models of differing 
dimensions that result in MCMC algorithms that mix well, and assessment of 
convergence of Markov chains (see [22], p. 204). Thus, e.g., the inferential 
conclusions in [32] and [33] are different on the same real data example using the 
same reversible jump method, since it turned out that in the first paper the chain 
was not run long enough. Cf. also the remarks on Bayesian histograms in [27], 
p. 546. 

Here we propose an alternative approach, inspired by ideas in [7] in the context 
of audio signal modelling different from the SDE setting that we consider; see also 
[8, 9, 15, 16] and [73]. Namely, instead of using a prior on the (squared) volatility 
that has piecewise constant realisations on [0, T] with independent coefficients 6,’s, 
we will assume that the sequence {0} forms a suitably defined Markov chain. An 
immediately apparent advantage of using such an approach is that it induces extra 
smoothing via dependence in prior realisations of the volatility function across 
different bins. Arguing heuristically, with a large number N of bins By it is then 
possible to closely mimick the local structure of the volatility: in those parts of 
the interval [0, 7], where the volatility has a high curvature or is subject to abrupt 
changes, a large number of (narrow) bins is required to adequately capture these 
features. However, the grid used to define the bins B,’s is uniform, and if 6;,..., Ov 
are a priori independent, a large N may induce spurious variability in the volatility 
estimates in those regions of [0, 7] where the volatility in fact varies slowly. As 
we will see in the sequel, this problem may be alleviated using a priori dependent 
Ox’ S. 

In the subsequent sections we detail our approach, and study its practical 
performance via simulation and real data examples. Specifically, we implement 
our method via a straightforward version of the Gibbs sampler, employing the 
fact that full conditional distributions of 6%’s are known in closed form (and are 
in fact inverse gamma). Unlike [37], posterior inference in our new approach 
requires the use of MCMC. However, this is offset by the advantages of our new 
approach outlined above, and in fact the additional computational complexity of 
our new method is modest in comparison to [37]. The prior in our new method 
depends on hyperparameters, and we will also discuss several ways of their choice 
in practice. 
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1.4 Organisation of This Paper 


In Sect. 2 we supply a detailed description of our nonparametric Bayesian approach 
to volatility estimation. In Sect.3 we study the performance of our method via 
extensive simulation examples. In Sect.4 we apply the method on a real data 
example. Section 5 summarises our findings and provides an outlook on our 
results. Finally, Sect.6 contains some additional technical details of our proce- 
dure. 


1.5 Notation 


We denote the prior distribution on the (squared) volatility function by [7 and write 
the posterior measure given data 2;, as IT(- | 2%,). We use the notation IG(a, 8) 
for the inverse gamma distribution with shape parameter a > 0 and scale parameter 
B > 0. This distribution has a density 


B g eR. 2G, (2) 
(a) 


X > 


For two sequences {ay}, {bn}, the notation a, =< by, will be used to denote the fact 
that the sequences are asymptotically (as n — oo) of the same order. Finally, for a 
density f and a function g, the notation f « g will mean that f is proportional to 
g, with proportionality constant on the righthand side recovered as ([ g)—!, where 
the integral is over the domain of definition of g (and of f). The function g can be 
referred to as an unnormalised probability density. 


2 Nonparametric Bayesian Approach 


2.1 Generalities 


Our starting point is the same as in [37]. Namely, we misspecify the drift 
coefficient bo by intentionally setting it to zero (see also [49] for a similar idea 
of ‘misspecification on purpose’). The theoretical justification for this under the 
‘infill’ asymptotics, with the time horizon T staying fixed and the observation times 
tin = iT/n, i = 1,...,n, filling up the interval [0, T] as n — ov, is provided 
in [37], to which we refer for further details (the argument there ultimately relies 
on Girsanov’s theorem). Similar ideas are also encountered in the non-Bayesian 
setting in the econometrics literature on high-frequency financial data, see, e.g., 
[53]. 
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Set Yin = Xi, — Xt;_,,,- With the assumption bo = 0, the pseudo-likelihood of 
our observations is tractable, in fact Gaussian, 


n 


1 Y; 
2 =] in 
En (s ) > 2 tin v) v tin 2D: : (3) 
i=l nf s“(u) du s“(u) du 


i-l,n ti-1,n 


where y(u) = exp(—u?/2). The posterior probability of any measurable set S of 
volatility functions can be computed via Bayes’ theorem as 


Js Ln(s7)IT(ds) 
ns 2 j= 
C1) = "TT OMG) 

Here the denominator is the normalising constant, the integral over the whole space 
on which the prior /7 is defined, which ensures that the posterior is a probability 
measure (i.e. integrates to one). 


2.2. Prior Construction 


Our prior /7 is constructed similarly to [37], with an important difference to be 
noted below. Fix an integer m < n. Thenn = mN +r withO < r < m, where 
N= L4 I. Now define bins By = [tm(k—1),ns tmk,n), k = 1,...,N — 1, and By = 
[tm(N—1),n> 1]. Thus the first N — | bins are of length mT /n, whereas the last bin 
By has length T — ty(n—1)n = n'(r +m)T < n~!2mT. The parameter NV 
(equivalently, m) is a hyperparameter of our prior. We model s as piecewise constant 
on bins B;, thus s = y 3 &1p,. The prior JT on the volatility s can now be defined 
by assigning a prior to the coefficients &;’s. 
Let & = &2. Since the bins By are disjoint, 


N N 
=) 1p, = >) a1 Ky. 
k=1 k=1 


As the likelihood depends on s only through its square s7, it suffices to assign the 
prior to the coefficients 6;’s of s*. This is the point where we fundamentally diverge 
from [37]. Whereas in [37] it is assumed that {6} is an i.i.d. sequence of inverse 
gamma random variables, here we suppose that {6;,} forms a Markov chain. This 
will be referred to as an inverse Gamma Markov chain (IGMC) prior (see [7]), and 
is defined as follows. Introduce auxiliary variables ¢%, k = 2,..., N, and define a 
Markov chain using the time ordering 61, f2, 02,..., Sk, Ok, ..., CN, On. Transition 
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distributions of this chain are defined as follows: fix hyperparameters a, a¢ and a, 
and set 


61 ~IG(@1,01), Ser 11OKe~IG(ag, a0"), Oetileer1~ IG, ag). 4) 


The name of the chain reflects the fact that these distributions are inverse Gamma. 


Remark I Our definition of the IGMC prior differs from the one in [7] in the choice 
of the initial distribution of @,, which is important to alleviate possible ‘edge effects’ 
in volatility estimates in a neighbourhood of t = 0. The parameter a; determines 
the initial distribution of the inverse Gamma Markov chain. Letting a; — 0 (which 
corresponds to a vague prior) ‘releases’ the chain at the time origin. oO 


Remark 2 As observed in [7], there are various ways of defining an inverse 
Gamma Markov chain. The point to be kept in mind is that the resulting posterior 
should be computationally tractable, and the prior on 6;’s should have a capability 
of producing realisations with positive correlation structures, as this introduces 
smoothing among the 6;’s in adjacent bins. This latter property is not possible 
to attain with arbitrary constructions of inverse Gamma Markov chains, such as 
e.g. a natural construction 6%|0,_1 ~ IG(a@, 6,1 /a). On the other hand, positive 
correlation between realisations 6;’s can be achieved e.g. by setting 0,|0,-1 ~ 
IG(a, (aO,_1)~!), but this results in intractable posterior computations. The defi- 
nition of the IGMC prior in the present work, that employs latent variables ¢;’s, 
takes care of both these important points. For an additional discussion see [7]. O 


Remark 3 Setting the drift coefficient bo to zero effectively results in pretending 
that the process X has independent (Gaussian) increments. In reality, since the 
drift in practical applications is typically nonzero, increments of the process are 
dependent, and hence all observations Y;,, contain some indirect information on 
the value of the volatility s* at each time point t € [0,7]. On the other hand, 
assuming the IGMC prior on s? yields a posteriori dependence of coefficients {x}, 
which should be of help in inference with smaller sample sizes n. See Sect. 4 for an 
illustration. Oo 


2.3 Gibbs Sampler 


It can be verified by direct computations employing (4) that the full conditional 
distributions of 0;’s and ¢,%’s are inverse gamma, 


a a 
Ogee tn1 ~IG (a tay, 4 SE), k=2,...,N—1, (5) 
Ck Sk+1 


iilen ~1G (a4 + ag, + SE), (6) 
2 
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a 
on 
Ae a 
ClO, x1 ~IG( ae +a, —- + —]), k=2,...,N. (8) 
O-1 


See Sect. 6 for details. Next, the effective transition kernel of the Markov chain {6} 
is given by formula (4) in [7], and is a scale mixture of inverse gamma distributions; 
however, its exact expression is of no direct concern for our purposes. As noted in 
[7], p. 700, depending on the parameter values a, @;, it is possible for the chain 
{0} to exhibit either an increasing or decreasing trend. We illustrate this point by 
plotting realisations of {@,} in Fig. | for different values of w and a. In the context 
of volatility estimation this feature is attractive, if prior information on the volatility 
trend is available. 

Inference in [7] is performed using a mean-field variational Bayes approach, 
see, e.g., [5]. Here we describe instead a fully Bayesian approach relying on Gibbs 
sampling (see, e.g., [26] and [29]), cf. [9]. 

The algorithm is initialised at values G2,...,€y, e.g. generated from the prior 


specification (4). In order to derive update formulae for the full conditionals of the 
6;.’s, define 


km 
2 
Fo ay 
i=(k—1)m+1 
n 
2 
Zn = > Lene 
i=(N—1)m+1 


With this notation, the likelihood from (3) satisfies 


nZn pet nZx 
—(m+r)/2 —m/2 
Ln(@) « Oy exp (- _) I 0, exp (- a) : 


a = 40.0, a¢ = 20.0 a = 30.0, a¢ = 30.0 a = 20.0,a¢ = 40.0 
1910 1910 


108 108 
10° 10° 
104 104 aM 
10? 10? es 
10° 10° 
10-2 10-2 


= i =a | 4 L 
uo 100 200 300 400 uo 100 200 300 400 Y 100 200 300 400 


Fig. 1 Realisations of the Markov chain {0} witha = 40, a¢ = 20 (left panel) and a = 30, 
az = 30 (center panel) and w = 20, a; = 40 (right panel). In all cases, 6) is fixed to 500 
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Using this formula and Eq.(5), and recalling the form of the inverse gamma 
density (2), it is seen that the update distribution for 0;, k = 2,..., N — 1, is 


1a ( ages m a de Oe 4. t) 
a a ea: | ay —— —- aa ) 
Oe ee leg. Oe 


whereas by (7) the ones for 6; and @y are 


m Qe nZ\ 
—} —+——_—)), IG 
2 a Ey Se) («+ 


Z 
rd m+r aon “), 


respectively. 

Next, the latent variables ¢;’s will be updated using formula (8). This update step 
for ¢,%’s does not directly involve the data %;,, except through the previous values of 
O;’S. 

Finally, one iterates these two Gibbs steps for 6;’s and ¢,%’s a large number 
of times (until chains can be assessed as reasonably converged), which gives 
posterior samples of the 6;,’s. Using the latter, the posterior inference can proceed 
in the usual way, e.g. by computing the sample posterior mean of 6;’s, as well 
as sample quantiles, that provide, respectively, a point estimate and uncertainty 
quantification via marginal Bayesian credible bands for the squared volatility s*. 
Similar calculations on the square roots of the posterior samples can be used to 
obtain point estimates and credible bands for the volatility function s itself. 


2.4 Hyperparameter Choice 


We first assume the number of bins N has been chosen in some way, and we 
only have to deal with hyperparameters a, aw- and aj, that govern properties of 
the Markov chain prior. In [7], where an IGMC prior was introduced, guidance on 
the hyperparameter selection is not discussed. In [8], the hyperparameters are fine- 
tuned by hand in specific problems studied there (audio denoising and single channel 
audio source separation). Another practical solution is to try several different fixed 
combinations of the hyperparameters a, a; and a}, if only to verify sensitivity 
of inferential conclusions with respect to variations in the hyperparameters. Some 
further methods for hyperparameter optimisation are discussed in [16]. In [8] opti- 
misation of the hyperparameters via the maximum likelihood method is suggested; 
practical implementation relies on the EM algorithm (see [13]), and some additional 
details are given in [15]. Put in other terms, the proposal in [15] amounts to using 
an empirical Bayes method (see, e.g., [20], [59] and [60]). The use of the latter 
is widespread and often leads to good practical results, but the method is still 
insufficiently understood theoretically, except in toy models like the white noise 
model (see, however, [17] and [56] for some results in other contexts). On the 
practical side, in our case, given that the dimension of the sequences {¢,} and {6} is 
rather high, namely 2N — 1 with N large, and the marginal likelihood is not available 
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in closed form, this approach is expected to be computationally intensive. Therefore, 
a priori there is no reason not to try instead a fully Bayesian approach by equipping 
the hyperparameters with a prior, and this is in fact our default approach in the 
present work. However, the corresponding full conditional distribution turns out to 
be nonstandard, which necessitates the use of a Metropolis-Hastings step within the 
Gibbs sampler (see, e.g., [39], [50] and [68]). We provide the necessary details in 
Sect. 6. 

Finally, we briefly discuss the choice of the hyperparameter N. As argued in [37], 
in practice it is recommended to use the theoretical results in [37] (that suggest to 
take N x n*/@+). if the true volatility function so is A~-Hélder smooth) and try 
several values of N simultaneously. Different N’s all provide information on the 
unknown volatility, but at different resolution levels; see Section 5 in [37] for an 
additional discussion. As we will see in simulation examples in Sect. 3, inferential 
conclusions with the IGMC prior are quite robust with respect to the choice of 
N. This is because through the hyperparameters a and a;, the IGMC prior has 
an additional layer for controlling the amount of applied smoothness; when @ and 
@¢ are equipped with a prior (as above), they can in fact be learned from the data. 


3 Synthetic Data Examples 


Computations in this section have been done in the programming language Julia, 
see [3]. In order to test the practical performance of our estimation method, we use 
a challenging example with the blocks function from [18]. As a second example, we 
consider the case of the Cox-Ross-Ingersoll model. Precise details are given in the 
subsections below. 

We used the Euler scheme on a grid with 800,001 equidistant points on the 
interval [0, 1] to obtain realisations of a solution to (1) for different combinations 
of the drift and dispersion coefficients. These were then subsampled to obtain 
n = 4000 observations in each example. 

The hyperparameter a, was set to 0.1, whereas for the other two hyperparameters 
we assumed that a = a; and used a diffuse IG(0.3, 0.3) prior, except in specially 
noted cases below. Inference was performed using the Gibbs sampler from Sect. 2, 
with a Metropolis—Hastings step to update the hyperparameter a. The latter used an 
independent Gaussian random walk proposal with a scaling to ensure the acceptance 
rate of ca. 50%; see Sect. 6. The Gibbs sampler was run for 200,000 iterations and 
we used a burn-in of 1000 samples. In each example we plotted 95% marginal 
credible bands obtained from the central posterior intervals for the coefficients 


& = JO. 


3.1 Blocks Function 


As our first example, we considered the case when the volatility function was given 
by the blocks function from [18]. With a vertical shift for positivity, this is defined 
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as follows: 


11 
s(t) = 10 + 3.655606 x S*hjK(t—t)), 1 € [0,1], (9) 
j=! 


where K (t) = (1 + sgn(t))/2, and 


{tj} = (0.1, 0.13, 0.15, 0.23, 0.25, 0.4, 0.44, 0.65, 0.76, 0.78, 0.81), 
{hj} = (4, -5, 3, —4, 5, —4.2, 2.1, 4.3, —3.1, 2.1, —4.2). 


The function serves as a challenging benchmark example in nonparametric regres- 
sion: it is mostly very smooth, but spatially inhomogeneous and characterised 
by abrupt changes (cf. Chap.9 in [74]). Unlike nonparametric regression, the 
noise (Wiener process) in our setting should be thought of as multiplicative and 
proportional to s rather than additive, which combined with the fact that s takes 
rather large values further complicates the inference problem. Our main goal here 
was to compare the performance of the IGMC prior-based approach to the IG prior- 
based one from [37]. To complete the SDE specification, our drift coefficient was 
chosen to be a rather strong linear drift bo(x) = —10x + 20. 

In Fig. 2 we plot the blocks function (9) and the corresponding realisation of the 
process X used in this simulation run. 

The left and right panels of Fig. 3 contrast the results obtained using the IGMC 
prior with N = 160 anda = a; = 20 versus N = 320 anda = a; = 40. 
These plots illustrate the fact that increasing N has the effect of undersmoothing 
prior realisations, that can be balanced by increasing the values of a, a, which has 
the opposite smoothing effect. Because of this, in fact, both plots look quite similar. 

The top left and top right panels of Fig. 4 give estimation results obtained with 
the IIG prior-based approach from [37]. The number of bins was again N = 160 
and N = 320, and in both these cases we used diffuse independent IG(0.1, 0.1) 
priors on the coefficients of the (squared) volatility function (see [37] for details). 


30 


20 


10 


ce 
0.0 0.2 0.4 0.6 0.8 1.0 = 0.0 0.2 0.4 0.6 0.8 1.0 


Fig. 2 The sample path of the process X from (9) (left panel) and the corresponding volatility 
function s (right panel) 
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Fig. 3 Volatility function s from (9) with superimposed 95% marginal credible band for the IGMC 
prior, using N = 160, a = a; = 20 (left panel) and N = 320, a = a = 40 (right panel); in both 


cases, a; = 0.1 


80 - si — 53 
—= 83 o i 
MS marginal 95% cred. band M5 marginal 95% cred. band 
60 
40 
20 | 
OF t 1 t t } i 
0.0 0.2 0.4 0.6 0.8 0 0.8 
80 80 = 
—— 83 7 
5 marginal 95% cred. band M5 marginal 95% cred. band 
60 L 60 
40 F 40} 
20 20 
OE t 1 t t | Ok 1 i t 1 
0.0 0.2 0.4 0.6 0.8 10° ©=60.0 0.2 0.4 0.6 0.8 


Fig. 4 Volatility function s with superimposed 95% marginal credible band for the IG prior 
IG(0.1, 0.1), using N = 160 (top left panel) and N = 320 bins (top right panel). Volatility 
function s from (9) with superimposed 95% marginal credible band for the IGMC prior, using 
N = 160 (bottom left panel) and N = 320 bins (bottom right panel); in both cases, a; = 0.1 and 
a =a; ~ IG(0.3, 0.3) 


These results have to be contrasted to those obtained with the IGMC prior, plotted 
in the bottom left and bottom right panels of Fig. 4, where we assumed a; = 0.1 
and a@ = a; ~ IG(0.3, 0.3). The following conclusions emerge from Fig. 4: 
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¢ Although both the IGMC and IG approaches recover globally the shape of 
the volatility function, the IIG approach results in much greater uncertainty in 
inferential conclusions, as reflected in wider marginal confidence bands. The 
effect is especially pronounced in the case N = 320, where the width of the 
band for the IIG prior renders it almost useless for inference. 

e The bands based on the IGMC prior look more ‘regular’ than the ones for the IIG 
prior. 

¢ Comparing the results to Fig. 3, we see the benefits of equipping the hyperparam- 
eters a, @¢ with a prior: credible bands in Fig.3 do not adequately capture two 
dips of the function s right before and after the point t = 0.2, since s completely 
falls outside the credible bands there. Thus, an incorrect amount of smoothing is 
used in Fig. 3. 

¢ The method based on the IIG prior is sensitive to the bin number selection: 
compare the top left panel of Fig.4 using N = 160 bins to the top right panel 
using N = 320 bins, where the credible band is much wider. On the other hand, 
the method based on the IGMC prior automatically rebalances the amount of 
smoothing it uses with different numbers of bins N, thanks to the hyperprior on 
the parameters a, a; ; in fact, the bottom two plots in Fig. 4 look similar to each 
other. 


3.2 CIR Model 


Our core estimation procedure, as described in the previous sections, assumes that 
the volatility function is deterministic. In this subsection, however, in order to test 
the limits of applicability of our method and possibilities for future extensions, we 
applied it to a case where the volatility function was stochastic. The study in [53] 
lends support to this approach, but here we concentrate on practical aspects and 
defer the corresponding theoretical investigation until another occasion. 

Specifically, we considered the Cox-Ross-Ingersoll (CIR) model or the square 
root process, 


dx; => (1 = n2X;)dt + 13V X,dW;, Xo =x> 0, te [0, T|. (10) 


Here 71,72,73 > O are parameters of the model. This diffusion process was 
introduced in [23] and [24], and gained popularity in finance as a model for short- 
term interest rates, see [11]. The condition 2n; > ny ensures strict positivity and 
ergodicity of X. The volatility function so from (1) now corresponds to a realisation 
of a stochastic process t > 3./X;, where X solves the CIR equation (10). 

We took arbitrary parameter values 


m=6, m=3, m=2, x=1. (11) 
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Fig. 5 The sample path of the process X from (10) (left panel) and the corresponding realised 
volatility function s(q@) (right panel). The parameter values are given in (11) 
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Fig. 6 Volatility function s from (10) with superimposed 95% marginal credible band for the 
IGMC prior, using N = 160 (left panel) and N = 320 bins (right panel); in both cases, a; = 0.1 
and a = ar ~ IG(0.3, 0.3) 


A sample path of X is plotted in the left panel of Fig.5, whereas the corresponding 
volatility is given in the right panel of the same figure. In Fig.6 we display 
estimation results obtained with the IGMC prior, using N = 160 and N = 320 
bins and hyperparameter specifications a; = 0.1 anda = a; ~ IG(0.3,0.3). A 
conclusion that emerges from this figure is that our Bayesian method captures the 
overall shape of the realised volatility in a rather satisfactory manner. 


4 Dow-Jones Industrial Averages 


In this section we provide a reanalysis of a classical dataset in change-point 
detection in time series; see, e.g., [10, 14, 41, 42] and [43]. Specifically, we consider 
weekly closing values of the Dow-Jones industrial averages in the period 2 July 
1971-2 August 1974. In total there are 162 observations available, which constitute 
a relatively small sample, and thus the inference problem is rather nontrivial. The 
data can be accessed as the dataset DWJ in the sde package (see [44]) in R (see 
[58]). See the left panel of Fig.7 for a visualisation. In [43] the weekly data X;,, 
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Fig. 7 Dow-Jones weekly closings of industrial averages over the period 2 July 1971-2 August 
1974 (left panel) and the corresponding returns (right panel) 


i = l,...,n, are transformed into returns Y, = (X;, — X1,_,)/X1;_,, and the 
least squares change-point estimation procedure from [12] has been performed. 
Reproducing the corresponding computer code in R results in a change-point 
estimate of 16 March 1973. That author speculates that this change-point is related 
to the Watergate scandal. 

Similar to [43], parametric change-point analyses in [10, 14] and [42] give 
a change-point in the third week of March 1973. However, as noted in [43], 
examination of the plot of the time series Y;, (see Fig. 7, the right panel) indicates 
that another change-point may be present in the data. Then dropping observations 
after 16 March 1973 and analysing the data for existence of a change-point using 
only the initial segment of the time series, the author discovers another change-point 
on 17 December 1971, which he associates with suspending the convertibility of the 
US dollar into gold under President Richard Nixon’s administration. 

From the above discussion it should be clear that nonparametric modelling of 
the volatility may provide additional insights for this dataset. We first informally 
investigated the fact whether an SDE driven by the Wiener process is a suitable 
model for the data at hand. Many of such models, e.g. the geometric Brownian 
motion, a classical model for evolution of asset prices over time (also referred to as 
the Samuelson or Black-Scholes model), rely on an old tenet that returns of asset 
prices follow a normal distribution. Although the assumption has been empirically 
disproved for high-frequency financial data (daily or intraday data; see, e.g., [6, 19] 
and [48]), its violation is less severe for widely spaced data in time (e.g. weekly data, 
as in our case). In fact, the Shapiro—Wilk test that we performed in R on the returns 
past the change-point 16 March 1973 did not reject the null hypothesis of normality 
(p-value 0.4). On the other hand, the quantile-quantile (QQ) plot of the same data 
does perhaps give an indication of a certain mild deviation from normality, see 
Fig. 8, where we also plotted a kernel density estimate of the data (obtained via 
the command density in R, with bandwidth determined automatically through 
cross-validation). 
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Fig. 8 QQ plot of the returns of Dow-Jones weekly closings of industrial averages over the period 
16 March 1973-2 August 1974 (left panel) and a kernel density estimate of the same data (right 
panel) 
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Fig. 9 Sample autocorrelation (left panel) and partial autocorrelation functions of the returns of 
Dow-Jones weekly closings of industrial averages over the period 16 March 1973-2 August 1974 


In Fig. 9 we plot the sample autocorrelation and partial autocorrelation functions 
based on returns Y;,’s past the change-point 16 March 1973. These do not give 
decisive evidence against the assumption of independence of Y;,’s. Neither does 
the Ljung—Box test (the test is implemented in R via the command Box.test), 
which yields a p-value 0.057 when applied with 10 lags (the p-value is certainly 
small, but not overwhelmingly so). 

Summarising our findings, we detected only a mild evidence against the assump- 
tion that the returns of the Dow-Jones weekly closings of industrial averages (over 
the period 16 March 1973-2 August 1974, but similar conclusions can be reached 
also over the other subperiods covered by the DWJ dataset) are approximately 
independent and follow a normal distribution. Thus there is no strong a priori reason 
to believe that a geometric Brownian motion is an outright unsuitable model in 
this setting: it can be used as a first approximation. To account for time-variability 
of volatility (as suggested by the change-point analysis), we incorporate a time- 
dependent volatility function in the model, and for additional modelling flexibility 
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we also allow a state-dependent drift. Setting Z,; = log(X;/ Xo), our model is thus 
given by 


dZ, = bo(t, Z;)dt + 50(t)dW;, Zo =O. (12) 


An alternative here is to directly (i.e. without any preliminary transformation) 
model the Dow-Jones data using Eq. (1). We consider both possibilities, starting 
with the model (12). 

We used a vague prior on 6; corresponding to the limit a] — 0, whereas for 
the other two hyperparameters we assumed a = a; ~ IG(0.3, 0.3). The scaling in 
the independent Gaussian random walk proposal in the Metropolis—Hastings step 
was chosen in such a way so as to yield an acceptance rate of ca. 50%. The Gibbs 
sampler was run for 200,000 iterations, and the first 1000 samples were dropped as 
a burn-in. We present the estimation results we obtained using N = 13 and N = 26 
bins, see Fig. 10. Although the sample size n is quite small in this example, the data 
are informative enough to yield nontrivial inferential conclusions even with diffuse 
priors. Both plots in Fig. 10 are qualitatively similar and suggest: 


¢ A decrease in volatility at the end of 1971, which can be taken as corresponding 
to the change-point in December 1971 identified in [43]. Unlike that author, we 
do not directly associate it with suspending the convertibility of the US dollar 
into gold (that took place in August 1971 rather than December 1971). 

¢ A gradual increase in volatility over the subsequent period stretching until the end 
of 1973. Rather than only the Watergate scandal (and a change-point in March 
1973 as in [43]), there could be further economic causes for that, such as the 
1973 oil crisis and the 1973-1974 stock market crash. 

e A decrease in volatility starting in early 1974, compared to the immediately 
preceding period. 


In general, in this work we do not aim at identifying causes for changes in volatility 
regimes, but prefer to present our inference results, that may subsequently be used 
in econometric analyses. 

Now we move to the Bayesian analysis of the data using model (1). The prior 
settings were as in the previous case, and we display the results in Fig. 11. The 
overall shapes of the inferred volatility functions are the same in both Figs. 10 and 
11, and hence similar conclusions apply. 

Finally, we stress the fact that our nonparametric Bayesian approach and change- 
point estimation are different in their scope: whereas our method aims at estimation 
of the entire volatility function, change-point estimation (as its name actually 
suggests) concentrates on identifying change-points in the variance of the observed 
time series, which is a particular feature of the volatility. To that end it assumes 
the (true) volatility function is piecewise constant, which on the other hand is not 
an assumption required in our method. Both techniques are useful, and each can 
provide insights that may be difficult to obtain from another. 


296 Gugushvili et al. 


“107? 10-7 

35 marginal 90% cred. band 3 marginal 90% cred. band 
0.02 | 0.02 
if pn a Pacer nase 
0.00 |- 0.00 
-0.01 01 
00 1972 1973 1974 oe 1972 1973 1974 

a ¥ 


Fig. 10 Marginal 90% credible bands for the volatility function of the log Dow-Jones industrial 
averages data. The left panel corresponds to N = 13 bins, while the right panel to N = 26 bins 
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Fig. 11 Marginal 90% credible bands for the volatility function of the Dow-Jones industrial 
averages data. The left panel corresponds to N = 13 bins, while the right panel to N = 26 
bins 


5 Conclusions 


Bayesian inference for SDEs from discrete-time observations is a difficult task, 
owing to intractability of the likelihood and the fact that the posterior is not 
available in closed form. Posterior inference therefore typically requires the use 
of intricate MCMC samplers. Designing algorithms that result in Markov chains 
that mix well and explore efficiently the posterior surface is a highly nontrivial 
problem. Inspired by some ideas from the audio signal processing literature and 
our earlier work [37], in this paper we introduced a novel nonparametric Bayesian 
approach to estimation of the volatility coefficient of an SDE. Our method is 
easy to understand and straightforward to implement via Gibbs sampling, and 
performs well in practice. Thereby our hope is that our work will contribute to 
further dissemination and popularisation of a nonparametric Bayesian approach to 
inference in SDEs, specifically with financial applications in mind. In that respect, 
see [38], that builds upon the present paper and deals with Bayesian volatility 
estimation under market microstructure noise. Our work can also be viewed as a 
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partial fulfillment of anticipation in [31] that some ideas developed originally in the 
context of audio and music processing “will also find use in other areas of science 
and engineering, such as financial or biomedical data analysis”. 

As a final remark, we do not attempt to provide a theoretical, i.e. asymptotic 
frequentist analysis of our new approach here (see, e.g., the recent monograph [30], 
and specifically [37] for such an analysis in the SDE setting), but leave this as a 
topic of future research. 


6 Formulae for Parameter Updates 


In this section we present additional details on the derivation of the update formulae 
for the Gibbs sampler from Sect.2. The starting point is to employ the Markov 
property from (4), and using the standard Bayesian notation, to write the joint 
density of {¢,} and {6;} as 


N 
PO) | [ PCle-1) PRIS). (13) 
k=2 


6.1 Full Conditional Distributions 


We first indicate how (5) was derived. Insert expressions for the individual terms 
in (13) from (4) and collect separately terms that depend on 6; only, to see that the 
density of the full conditional distribution of 6;, k = 2,..., N—1, is proportional to 


0, Ne ARS gM ee Mes) 


Upon normalisation, this expression is the density of the IG(a+a;, at, tae oy, i) 
distribution, which proves formula (5). Formula (7) follows directly from the last 
expression in (4). Formula (8) is proved analogously to (5). Finally, (6) follows 
from (4) and Bayes’ formula. Cf. also [15], Appendix B.6. 


6.2  Metropolis-Within-Gibbs Step 


Now we describe the Metropolis—Hastings step within the Gibbs sampler, that is 
used to update the hyperparameters of our algorithm, in case the latter are equipped 
with a prior. For simplicity, assume a = a (we note that such a choice is used 
in practical examples in [8]), and suppose @ is equipped with a prior, a ~ z. Let 
the hyperparameter a be fixed. Obviously, a; could have been equipped with a 
prior as well, but this would have further slowed down our estimation procedure, 
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whereas the practical results in Sects. 3 and 4 we obtained are already satisfactory 
with a, fixed. Using (4) and (13), one sees that the joint density of {f,}, {@,} and a 
is proportional to 


-1 
(a) x oo x e719 
N 
<T] ae Crate a 1%) - Gp Ne Orbe) \ 
15 |e Payee 


This in turn is proportional to 


a \2(N-1l) N 
q(a) = m(a) x ( = ) x T] Ox-1%52)-¢ 
k=2 


I(a) 
N 
1 1 1 
= — {— + — . 
« xn “D5, (s- x)) 


The latter expression is an unnormalised full conditional density of a, and can be 
used in the Metropolis-within-Gibbs step to update a. 

The rest of the Metropolis—Hastings step is standard, and the following approach 
was used in our practical examples: pick a proposal kernel g(a’ | a), for instance a 
Gaussian random walk proposal g(a’ | a) = ¢g (a’ — a), where @g is the density of 
a normal random variable with mean zero and variance o7. Note that this specific 
choice may result in proposing a negative value a’, which needs to be rejected 
straightaway as invalid. Then, for computational efficiency, instead of moving to 
another step within the Gibbs sampler, one keeps on proposing new values a’ until 
a positive one is proposed. This is then accepted with probability 


A =min (1 go) 20) ) ; 


” g(a) Bo (a’) 


where ®,(-) is the cumulative distribution function of a normal random variable 
with mean zero and variance o; otherwise the current value @ is retained. See 
[75] for additional details and derivations. Finally, one moves to other steps in the 
Gibbs sampler, namely to updating ¢;’s and 6;’s, and iterates the procedure. The 
acceptance rate in the Metropolis—Hastings step can be controlled through the scale 
parameter o of the proposal density ¢,. Some practical rules for determination of 
an optimal acceptance rate in the Metropolis—Hastings algorithm are given in [28], 
and those for the Metropolis-within-Gibbs algorithm in [64]. 
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The Exact Asymptotics for Hitting M®) 
Probability of a Remote Orthant peels 
by a Multivariate Lévy Process: The 

Cramér Case 


Konstantin Borovkov and Zbigniew Palmowski 


Abstract For a multivariate Lévy process satisfying the Cramér moment condition 
and having a drift vector with at least one negative component, we derive the 
exact asymptotics of the probability of ever hitting the positive orthant that is 
being translated to infinity along a fixed vector with positive components. This 
problem is motivated by the multivariate ruin problem introduced in Avram et al. 
(Ann Appl Probab 18:2421—2449, 2008) in the two-dimensional case. Our solution 
relies on the analysis from Pan and Borovkov (Preprint. arXiv:1708.09605, 2017) 
for multivariate random walks and an appropriate time discretization. 


1 Introduction 


In this note we consider the following large deviation problem for continuous 
time processes with independent increments that was motivated by the multivariate 
simultaneous ruin problem introduced in [1]. Let {X(t)};>0 be a d-dimensional 
(d > 2) right-continuous Lévy process with X (0) = 0. One is interested in finding 
the precise asymptotics for the hitting probability of the orthant sG as s > ow, 
where 


G:=g+Q* 
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for some fixed 
geQt, OF :={x=(x1,...,x%4) €R4 : x; > 0, 1 <j < dh. 


Clearly, sG = sg + Q*, which is just the positive orthant translated by sg. 

We solve this problem under appropriate Cramér moment assumptions and 
further conditions on the process X and vertex g that, roughly speaking, ensure 
that the “most likely place” for X to hit sG when s is large is in vicinity of the 
“corner point” sg. More specifically, we show that the precise asymptotics of the 
hitting probability of sG are given by the following expression: letting 


t(V) :=inf{t > 0: X(t) € V} 
be the first hitting time of the set V C R?@ by the process X, one has 
P(z(sG) < 00) = Aps @ YD PeP@ 1 + 0(1)) as 500, (1) 


where the “adjustment coefficient” D(G) is the value of the second rate function 
(see (9) below) for the distribution of X(1) on the set G and the constant Ag € 
(0, oo) can be computed explicitly. 

The asymptotics (1) extend a number of known results. The main body of 
literature on the topic of precise asymptotics for boundary crossing large deviation 
probabilities in the multivariate case concerns the random walk theory, see [3-5] 
and references therein for an overview of the relevant results. The crude logarithmic 
asymptotics in the multivariate case was also derived independently in [6]. 

The entrance probability to a remote set for Lévy processes was analyzed later, 
usually under some specific assumptions on the structure of these processes. For 
example, paper [1] dealt with the two-dimensional reserve process of the form 


N(t) 
X(t) = (X10), X2()) = (c1, €2) > Ci — (pi, prt, t=O, (2) 


i=1 


where cj, pj > O,i = 1,2, are constants, {C;};>1 is a sequence of i.i.d. claim 
sizes, and N(t) is an independent of the claim sizes Poisson process. That model 
admits the following interpretation: the components of the process sg — X describe 
the dynamics of the reserves of two insurance companies that start with the initial 
reserves sg, and sg2, respectively, and then divide between them both claims and 
premia in some pre-specified proportions. In that case, P(t (sG) < co) corresponds 
to the simultaneous ruin probability of the two companies. The main result of the 
present paper generalizes the assertion of Theorem 5 of [1] to the case of general 
Lévy processes. One may also wish to mention here the relevant papers [2, 7]. 
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2 The Main Result 


To state the main result, we will need some notations. For brevity, denote by & a 
random vector such that 


d 
§=X(1). (3) 
Our first condition on X is stated as follows. 


C; = The distribution of & is non-lattice and there is no hyperplane H = {x : 
(a,x) =c} C R¢ such that P(E € H) = 1. 


That condition can clearly be re-stated in terms of the covariance matrix of the 
Brownian component and spectral measure of X, although such re-statement will 
not make it more compact nor transparent. 

Next denote by 


K (4) := InEe*®), A ERY, (4) 
the cumulant function of & and let 
= {. © R?: K(A) < cw} 


be the set on which the moment generating function of & is finite. We will need the 
following Cramér moment condition on X: 


C2 Oy contains a non-empty open set. 


The first rate function A(a@) for the random vector & is defined as the Legendre 
transform of the cumulant function K : 


A(a) := sup ((a,A4)— K(A)), aE R?. (5) 


16Oy 


The probabilistic interpretation of the first rate function is given by the following 
relation (see e.g. [5]): for any w € R¢, 


A(a) = — lim lim ~nP(= m) € usa), (6) 


e>0N>w Nn 


where U; («) is the e-neighborhood of w. Accordingly, for a set B C R@, any point 
a € B such that 


A(a) = A(B) := inf A(v) (7) 
vEeB 
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is called a most probable point (MPP) of the set B (cf. relation (11) in [8]). If such 
a point & is unique for a given set B, we denote it by 


a[B] := arg min A(v). (8) 
vEeB 


Now recall the definition of the second rate function D that was introduced and 
studied in [4]: letting D,(v) := uA(v/u) for v € IR@, one sets 


D(v) := inf D,(v), ve RY, D(B) := inf Div), BCR? (9) 
u>0 veEB 


(see also [8]). Further, we put 


rg := argmin Dj/,;(B). (10) 


r>0 


Recall the probabilistic meaning of the function D and the value rg. While the first 
rate function A specifies the main term in the asymptotics of the probabilities for 
the random walk values X(n) to be inside “remote sets” (roughly speaking, A(B) 
equals the RHS of (6) with the neighbourhood of a in it replaced with B), the second 
rate function D does that for the probabilities of ever hitting “remote sets” by the 
whole random walk trajectory {X(7)}n>0, the meaning of rg being that 1/rg gives 
(after appropriate scaling) the “most probable time” for the walk to hit the respective 
remote set. For more detail, we refer the interested reader to [4, 8]. 
Define the Cramér range 2, for & as follows: 


Qa := {aw = grad K(A): A € int(Oy)}, 


where the cumulant function K (A) of € was defined in (4) and int(B) stands for 
the interior of the set B. In words, the set $2, consists of all the vectors that can 
be obtained as the expectations of the Cramér transforms of the law of &, i.e. the 
distributions of the form e)—* Opé € dx), for parameter values A € int(Oy). 

For a € R?@, denote by A(@) the vector X at which the upper bound in (5) is 
attained (when such a vector exists, in which case it is always unique): 


A(a) = (a, A(a)) — K(A(a)). 
For r > 0, assuming that w[rG] € Q,, introduce the vector 


N(r) := grad A(@)| = A(a[rG]), (11) 


a=a[rG] 


which is a normal to the level surface of A at the point a[rG] (see e.g. (22) in [8]). 
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The last condition that we will need to state our main result depends on the 
parameter r > 0 and is formulated as follows: 


C3(r) ~~ One has 
A(rG) = A(rg), rgéQa, N(r)€ Ot, (E&,N(r)) <0. 


The first part of condition [C3(r)] means that the vertex rg is an MPP for the set rG. 
Note that under the second part of the condition, this MPP rg for rG is unique (e.g., 
by Lemma | in [8]). Since N(r) always belongs to the closure of Q*, the third 
part of condition [C3(r)] just excludes the case when the normal N(r) to the level 
surface of A at the point rg belongs to the boundary of the set rG. 


Theorem 1 Let conditions [C,], [C2] and [C3(rg)] be met. Then the asymptotic 
relation (1) holds true, where D(G) is the value of the second rate function (9) on 
G and the constant Ag € (0, &©) can be computed explicitly. 


The value of the constant Ag € (0, 00) is given by the limit as 6 — 0 of the 


expressions given by formula (68) in [8] for the distribution of & - X (6). When 
proving the theorem below, we demonstrate that that limit does exist and is finite 
and positive. 


Proof Fora é > 0, consider the embedded random walk {X(n6)},cn and, for a set 
V CR%, denote the first time that random walk hits that set V by 


ns(V) := inf{n € N : X(nd) € V}. 
First observe that, on the one hand, for any 6 > 0, one clearly has 

P(t(sG) < 00) > P(73(sG) < 00). (12) 
On the other hand, assuming without loss of generality that minj<j<a gj = 1 and 


setting J(s) := (t(sG), t(sG) + 6] C R on the event {t(sG) < oo}, we have, for 
any e > 0, 


P(ys((s — €)G) < oo) > P (:00) < oo, sup ||X(t) — X(t(sG))|| < :) 
tel (s) 


= P(t(sG) < oP ( sup ||X(t)— X(t(sG))|| < ‘ 
tel(s) 


= P(t(sG) < 0 ( sup ||X()|| < 7 , (13) 
t<e(0,6] 


where the last two relations follow from the strong Markov property and homogene- 
ity of X. 
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Now take an arbitrary small ¢ > 0. As the process X is right-continuous, there 
exists a 6(€) > 0 such that 


P/( sup IOI se) > C40" 


1E(0,5(e)] 
which, together with (13), yields the inequality 
P(t(sG) < ow) < (1+ €)P(ns((s — €)G) < 00). (14) 


The precise asymptotics of the probability on the RHS of (12) were obtained 
in [8]. It is given in terms of the second rate function DI" for the distribution of the 
jumps X (né) — X((n — 1)d) e X (6) in the random walk {X(n6)},>0. Recalling the 
well-known fact that the cumulant of X (6) is given by 5K, we see that the first rate 
function A! for X (6) equals 


Al (oe) = sup ((o, A) — 6K (A)) 
160y 


= 6 sup ((«/5,2) — K(A)) =6A(a@/8), we R! 
16Oy 


(cf. (5)). Therefore the second rate function (see (9)) D!®! for X(8) is 


D°l(v) = inf wAll(v/u) = inf (u5) A(w/(ud)) =D(v), veR?. 


That is, the second rate function for the random walk {X(n6)} is the same for all 
5 > 0, which makes perfect sense as one would expect the same asymptotics for 
the probabilities P(j3(sG) < oo) for different 5. Hence the respective value a 
(see (10)) can easily be seen to be given by érg. 

Therefore, applying Theorem | in [8] to the random walk {X(n6)} and using 
notation Al! for the constant A appearing in that theorem for the said random walk, 


we conclude that, for any 6 € (0, d(€)], as s > on, 
P(n3(sG) < 00) = APls~ 4D e-8P@ 1 + 0(1)), (15) 
and likewise 
P(ns((s — €)G) < 00) = APM(s — e) Pe 8-991 + (1). (16) 
Now from (12) and (14) we see that, as s + oo, 


PEGG) <0) , AM 


[5] — 
A d + o(1)) < R(s) — s—(d—-1)/2¢—sD(G) = al _ e/s)4@-D/2 


(1+ 0()). 
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Therefore, setting R := lim infs.oo R(s), R :=lim SUP,_560 R(s), we have 
AMl<R<R< (1+ ee&POael 
for any 6 € (0, d(e)], and hence 


lim sup A’! < R< R < (14+ e)e°? lim inf AM. 
6-0 6-0 


As € > 0 is arbitrary small, we conclude that there exists lims_,9 All =: Ag € 


(0, co). Therefore there also exists 
lim R(s) = Ao. 
SOO 
The theorem is proved. Oo 
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Parisian Excursion Below a Fixed Level ®) 
from the Last Record Maximum of Lévy posts 
Insurance Risk Process 


Budhi A. Surya 


Abstract This paper presents some new results on Parisian ruin under Lévy 
insurance risk process, where ruin occurs when the process has gone below a 
fixed level from the last record maximum, also known as the high-water mark or 
drawdown, for a fixed consecutive periods of time. The law of ruin-time and the 
position at ruin is given in terms of their joint Laplace transforms. Identities are 
presented semi-explicitly in terms of the scale function and the law of the Lévy 
process. They are established using recent developments on fluctuation theory of 
drawdown of spectrally negative Lévy process. In contrast to the Parisian ruin of 
Lévy process below a fixed level, ruin under drawdown occurs in finite time with 
probability one. 


1 Introduction 


Let X = {X; : t => 0} be a spectrally negative Lévy process defined on filtered 
probability space (2, ¥,{F; : t => 0}, P), where ¥; is the natural filtration of X 
satisfying the usual assumptions of right-continuity and completeness. We denote 
by {P,,.x € R} the family of probability measure corresponding to a translation of 
X s.t. Xo = x, with P = Po, and define X,; = SUPp<,<; Xs the running maximum 
of X up to time t. The Lévy-It6 sample paths decomposition of the Lévy process is 
given by 


XxX; =i +084 [ i xv(dx, i, / x(v(dx, ds) — II(dx)ds), 


{x<-—l} —l<x <0} 
(1) 
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where « € R, o > O and (B;);>0 is standard Brownian motion, whilst v(dx, dt) 
denotes the Poisson random measure associated with the jumps process AX; := 
X, — X;— of X. This Poisson random measure has compensator given by [7 (dx)dt, 
where IT is the Lévy measure satisfying the integrability condition: 


0 
/ (1A x*)IT(dx) < 00. (2) 


We refer to Chap. 2 of [12] for more details on paths decomposition of X. 
Due to the absence of positive jumps, it is therefore sensible to define 


1 1 
¥(A) = — log Dfe**} = wa + 508 +f (e’* —1—Axprs—1)) (dx), 


(—00,0) 
(3) 


which is analytic on (Jm(A) < 0). Itis easily shown that w is zero at the origin, tends 
to infinity at infinity and is strictly convex. We denote by @ : [0, 00) — [0, oo) the 
right continuous inverse of the Laplace exponent (A), so that 


@(0)=sup{p>0:Y(p)=6} and w(@(@))=6 forall 6>0. 
It is worth mentioning that under the Esscher transform P” defined by 


dP” 


ala eX:-VO" for all v > 0, (4) 


the Lévy process (X, P”) is still a spectrally negative Lévy process. The Laplace 
exponent of X under the new measure P” has changed to y,(A) given by 


WA) =~Aatv-—YW), forrA>—v. (5) 
Subsequently, we define by ®, (0) the largest root of equation (A) = @ satisfying 
P, (0) = ®O0+W)) — v. 


Furthermore, assume that from some reference point of time in the past X has 
achieved maximum y > 0. Define drawdown process Y = {Y; : t > 0} of X by 


¥%,=X:;vy—X, (6) 


under measure P,, ,. Notice that we altered slightly our notation for the probability 
measure P, , to denote the law of X under which at time zero X has current 
maximum y > x and position x € R. We simply write P,, := Py, 9 the 


law of Y under which Yo = y, and use the notation E,, Ey, and Ejy to 
define the corresponding expectation operator to the above probability measures. 
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Subsequently, we denote by E} , the expectation under P), , by which the Lévy 
process X has the Laplace exponent w,,(A) (5). Recall that since X is a Lévy process, 
it follows that Y is strong Markov. 

In recent developments, some results regarding excursion below a (fixed) default 
level, say zero, of the Lévy process X with fixed duration (Parisian ruin) have been 
obtained and applied in finance and insurance (e.g. option pricing, corporate finance, 
optimal dividend, etc). We refer among others to Chesney et al. [6], Francois and 
Morellec [9], Broadie et al. [4], Dassios and Wu [8], Loeffen et al. [16, 17], Czarna 
and Palmowski [7] and Landriault et al. [15] and the literature therein for further 
discussions. In these papers, the excursion takes effect from the first time 7) = 
inf{t > 0 : X; < 0} the process X has gone below zero under measure P,, and 
default is announced at the first time t, = inf {t >r: (t — sup{s <t: Xs > 0}) > r} 
the Lévy process has gone below zero for r > 0 consecutive periods of time. 

In the past decades attention has been paid to find risk protection mechanism 
against certain financial assets’ outperformance over their last record maximum, 
also referred to as high-water mark or drawdown, which in practice may affect 
towards fund managers’ compensation. See, among others, Agarwal et al. [1] 
and Goetzmann et al. [10] for details. Such risk may be protected against using 
an insurance contract. In their recent works, Zhang et al. [21], Palmowski and 
Tumilewicz [19] discussed fair valuation and design of such insurance contract. 

Motivated by the above works, we consider a Parisian ruin problem, where ruin 
occurs when the Lévy risk process X has gone below a fixed level a > 0 from its last 
record maximum (running maximum) X; V y for a fixed consecutive periods of time 
r > 0. This excursion takes effects from the first time a = inf{t > 0: X;Vv y-a> 
X;} the process under Py, has gone below a fixed level a > O from the last record 
maximum X; V y. Equivalently, this stopping time can be written in terms of the first 
passage above level a > 0 of the drawdown process Y as t,4 = inf{t > 0: Y; > a}. 
Ruin is declared at the first time the process Y has undertaken an excursion above 
level a for r consecutive periods of time before getting down again below a, 1.e., 


tT =inf{t>r:(t—g)>r} with g,=sup{(0O<s <t:Y; <a}. (7) 


Working with the stopping time t, (7), we consider the Laplace transforms 


u 


bale “Lig cosp > cand yg fe"? ty 2a (8) 
for u,v,r > Oand y > x. The first quantity gives the law of the ruin time 1,., 
whereas the second describes the joint law of the ruin time t, and the position at 
ruin X;,. 

The rest of this paper is organized as follows. Section 2 presents the main results 
of this paper. Some preliminary results are presented in Sect. 3. Section 4 discusses 
the proofs of the main results. Section 5 concludes this paper. 
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2 Main Results 


The results are expressed in terms of the scale function W“ (x) of X defined by 


foe) 
/ eo * W (x)dx = forA > B(u), (9) 
0 


ae: 
WO) =u 


with W(x) = 0 for x < 0. We refer to wi? the scale function under P”. 
Following (9), it is straightforward to check under the new measure P” that 


Ww) (x) = eT Wet FO) (x), (10) 


for all uw and v such that u > —w(v) and w(v) < oo. To see this, take 
Laplace transforms on both sides. We will also use the notation W(x) to denote 


Je Ws? ay. 

It is known following [14] and [5] that, for any u > 0, the u—scale function 
W™ is C!(0, ov) if the Lévy measure JT does not have atoms and is C?(0, 00) if 
o > O. For further details on spectrally negative Lévy process, we refer to Chap. 6 
of Bertoin [3] and Chap. 8 of Kyprianou [12]. Some examples of Lévy processes for 
which W“ are available in explicit form are given by Kuznetzov et al. [11]. In any 
case, it can be computed by numerically inverting (9), see e.g. Surya [20]. 

In the sequel below, we will use the notation 2 (x, t) defined by 


CO 
2,1) =| WO ctx— e)=PIX, edz}, fore >0, 
€ 
and define its partial derivative w.r.t x, EReen t), by A“? (x, t), ie., 
= z 
A® (x, t) ay W'zex— e)=P{X, € dz}. 
& 


For convenience, we write 2) (x, t) = QO a, t) and AM (x,t) = AM Ge, t). 


We denote by Qe the role of 2“ under change of measure P”, i.e., 
z 
QO, 1) = i; WH (z+ x)=P*{X; € dz}, (11) 
0 
similarly defined for AM (x, t). Using (4), we can rewrite Q® (x, t) as follows 


2x, t) = eX VO DUTY) (x, t). (12) 


The main result concerning the Laplace transform (8) is given below. 
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Theorem 1 Define z = y — x, with y > x. Fora > Oandu,r > O, the Laplace 
transform of Tt, is given by 


: ; se —(u) QM (a — £,; r) 
pele Nt ccol} = tule — 2) Ta WO) 
r QM (a =z, 
+f (2 (a= 2.) - ES? 1a.) dr} (13) 


By inserting u = 0 in (13), we see that in contrary to the Parisian ruin probability 
under the Lévy process X, see e.g. [16], we have the following result. 


Corollary 1 For y > x andr > 0, Py,x{t < oo} = 1. 


Following the result of Theorem | and applying Esscher transform of measure, 
the joint law of ruin-time t, and the position at ruin X;, is given below. 


Proposition 1 Define z = y — x, with y > x, and p=u — W(v), with u > 0 and 
v such that (v) < oo. Fora > O andr > 0, the joint Laplace transform of t, and 
X,, is given by 


Qa —z,7) 


Ey,x{e uty tv Xt, lee |= ere” | + [Wa 4 ) W'?)(a) 
Ay (a,r) 
r Qy” —z, 
+f (2a pie BEAD AM a,n)at]}. 
0 Ay (a,r) 
(14) 


3 Preliminaries 


Before we prove the main results, we devote this section to some preliminary results 
required to establish (13)—(14); in particular, Theorem | on the Laplace transform 
of t,. By spatial homogeneity of the sample paths of X, we establish Theorem | 
under the measure Py. To begin with, we define for a > 0 stopping times: 


a =inf{t>0:Y,;>a} and +t, =inf{t>0O:Y; <a} under Py. (15) 


Due to the absence of positive jumps, we have by the strong Markov property of X 
that t, can equivalently be rewritten as t; = inf{t > 0: Y; < a} and that 


dy {e ac — e PO (y-a)_ (16) 


This is due to the fact that t7 < to} a.s., with to; = infff > 0: Y, = O}, 
and that {Y;,¢ < to}, Py,x} = {-Xi,t < jena Fe with z = y — x, where 
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ina = inf{t > 0: X; => a},a > 0. We refer to Avram et al. [2] and Mijatovi¢é and 
Pistorius [18]. 

In the derivation of the main results (13)-(14), we will also frequently apply 
Kendall’s identity (see e.g. Corollary VII.3 in [3]), which relates the distribution 
P{X, € dx} of a spectrally negative Lévy process X to the distribution P{T;* € dr} 


of its first passage time 7,* above x > 0 under P. This identity is given by 
tP{T* € dt}dx = xP{X; € dx}dt. (17) 


To establish our main results, we need to recall the following identities. 


Lemma 1 Define s = y — x, with y > x. Fora > 0, u => O and v such that 
W(v) < 0, the joint Laplace transform of t;> and Y_+ is given by 


By gfe EA coy} = HCY) — WE™ i RW Ode (18) 
Wig — a) 
a [wm — ue" W (a) — vO) — w) i eW @az]. 


The identity (18) is due to Theorem | in Avram et al. [2] taking account of (9)—(10). 
Corollary 2 Define s = y — x, with y > x. Fora > Oandu,9@ > 0, 


oe) 
Riale a —P(O)Y, + 1,3 de aa} = C) _ ihe Oe | e POL WO (z)dz (19) 
a—s 


wWiGa—s) 


Wea [w _ Oe Pa Ww (a) ~ (u —6)0(6) [ e OW dz]. 


Proof The result follows from inserting v = ®(@) in Eq. (18) and taking account 

es) 1 OO 76-vz 

that 7(®(0)) = 6, and fi" eo“ W™ (z)dz = Ieee es WM(Zdz a 

Along with Lemma | and Corollary 2, the three results below are used when 
applying inverse Laplace transforms to get the main results (13)—(14). 


Lemma 2 Fora given @ > Oanda such thata < (0), we have for y € R, 


[re or aad pee “P(X d 1d e  POy (20) 
€ C= 
, ESE EO =a) 
e P@y 

ie ont cof « [ <P{X, € dz}dudt = a6 a) (21) 
—a 


i’ W()=PIX, edz}=e", foru>Oandt > 0. 
0 
(22) 
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The results above are slightly generalizations of those given in [16] and can be 
proved in similar fashion of [16] using Kendall’s identity (17) and Tonelli. 


4 Proof of the Main Results 


4.1 Proof of Theorem I 


The proof is established for the case where X has paths of bounded and unbounded 
variation. To deal with unbounded variation case, we will use a limiting argument 
similar to the one employed in [16, 17] and adjust the ruin time (7) accordingly. For 
this reason, we introduce for « > 0 the stopping time 1, defined by 


Tf. =inf{t >r:(t— gf) >r} with gf :=sup{sy <1: ¥,; <a—€}. 


This stopping time represents the first time that the Lévy insurance risk process 
X has spent a fixed r > O units of time consecutively below pre-specified level 
a > 0 from its running maximum X; V y ending before X getting back up again to 
a level a — € > 0 below the running maximum. Note that t, = ry : 

By spatial homogeneity of X, the proof is given under measure P\, by which X 
starts at point zero and has current maximum y. We have for any y > a that 


€ 


By {e OF Mize aoa = e Py {tj - >r}t By {e a | 


{t6 <00,t,_, <r} } ; 


By the strong Markov property of the drawdown process Y (6), the second 
expectation can be worked out using tower property of conditional expectation, 


dy {e "We cooaz rit = uy i{e OA pa cul) 
= ayy {e Meee cr} sy fe "Lap <oo)}f 
= q1y{e “tae 1, - za ‘la e{e Wr Ligeeecl |, 


a-—e— 


where the last equality is due to the absence of positive jumps of X. Hence, 


ily {e*® Mire <ooy} =e” (1 — Py {tz_e = 7}) 
(23) 


oT oy {e “tae 1, eat Na c{e We dee aa 


a—eE— 
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Following the above, for y < a we have by strong Markov property of Y that 


dy {e Br lige suse = uy fe "Las <ool| Fez} | 


- tne ED ot conv,» {OMe - <n} Sja—e{e" Lx <o0}} 


-_ + : 
=e“ E\y{e“% 1.4 <co}} — a hy few" % Wes + <ooPiv,4 {Ta € <r] 


7 ot 7 ~~ “ € 
$B fe cory [eae el flaneeM Herc) 
(24) 


The first expectation in the last equality of (24) can be worked out in terms of 
the scale function Ww (x) using identity (18), whereas the second and the third 
expectations are given by the following propositions. To establish the results, we 


denote throughout by eg exponential random time with parameter 0, independent 
of X. 


Proposition 2 For givenu,r,€ > Oanda > 0, we have for any y > 0 that 


2 
ay fe 8 Lect cooyPits [Tae < r}} = 2 (a—y,r)— vf QM (a — y, tat 


WM(a— i. 
aed (A%@n- uf A‘ (a, t)dt). 
0 


W)'(a) 
(25) 
Proof On recalling (16), we have by Tonelli, Lemma | and Corollary 2, 
Co 
i dre~8" nye Mets hedge 
= tele 1 Pr sfeezted} 29) 
~ @ ly {tar <oo} V+ 6 = lg 
1 60 | —utt—B(6)Y_+ 
at eel uy fe Deg + <oo)}- 


Furthermore, observe following the result of Corollary 2 that for 6 > u we have 


_ 8 ,- 6a yyw (q i 


(ta + <eai | ~ (6) 


Ely 


1 


| —utt—PO)Y_ + 
e 17; 


(0 — u) |-()a 
@(0) 


w(a-y) 


CO 
— Ww @ [@ — we 2 [ e POW + addz]. 


CO 
i e POW +. a — y)dz 
0 
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Define P(x, r) = [°° 


x 


£P{X + € dz}. Following the above, we have from (26) that 
Co = _ + _ 
i dre "Ey fe i Vid <oopP iy, + ame s rj} 


1 =yit = 
= sw fe UT, 13 <oo} Ply, 1€ = | 


(9 —u) 


_ PDE WW) (gq — 
0b(0) Ge) 
(@—Uu) _o@e ie —0(6)z yw 
ees W — y)d 
+ 56@)° 3 e (z+a-— y)dz 


Ww“) (a —y) oe [~~ -e@ zy uy 
= per Te, eee] 


Next, recall following (20)-(21), (16) and the Kendall’s identity (17) that 


(sm ~ mo i. i are"(Powr) =u I Gea) 


= UN -SO)(c+6) ww * oor ( Alu) "4 (u) 
4 (1-S)e W'e + ajdz = ; dre (4! (a,r)—u 5 Ae (a, dt). 


Moreover, by applying integration by part we have after some calculations that 
[o,@) 
/ dzw'(z + x) P(e + 6,1) = QM (x, 1) — WOT, 2). (27) 
0 


The claim in (25) is established following the above and by Tonelli and Laplace 
inversion (noting that both sides of (26) is right-continuous in r) to (26). oO 


Proposition 3 For givenu,r,€ > O anda > 0, we have for any y > 0 that 


"2 —ut;r 7 UTq— 
vfe a Li coe IY 4 {e a la <n}] 


Ww”(a-—y) 


= er (QMmig — a 
=e" (QM ay W'(a) 


AM (a, r)). 
(28) 


Proof On recalling (16), we have by Tonelli, Lemma | and Corollary 2, 


lee) 
Ory ete + = 
[ dre r uy fe UT, 14+ <00} v4 {e UTg_¢ liz ent} 


+ 


Ls + ‘ = 
= 3 ay fe UTq Le Sel Y {e “Lessc at (29) 


1 


1 n 
= ef @ +u)(a—€) 7 


| —utg —P(O+u)¥ + 
0 rye : 


{ta <oo} . 
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From Corollary 2, the expectation on the right hand side is given by 


fer ttd ~POHW)Y 0-26 +u)a yw) 
ay fe : ea 25) =P Lay ew (a—y) 
es a —@(6+u)a i. —P(O-+u)z yy) )d 
——_e e zta-—y)dz 
BO +u) 0 “ 
Ww a— lee) : 
_ ( Y) y,-2O+u)a e POH E WW'(2 + a)dz. 
wwr(a) 0 


Following the above, we have from the Laplace transform (29) that 


er) 
—6rim utr x UTg_ 
[ dre by fe “ Bit cook Yt {e : Anz <n) 


1 —P(6+u)E 177( 
pa eee ce Wa — 
O(O + in (@—-y) 
1 @(O+u) ®(0+u) (uy 
+ e oe] e WZWVO"'(z + a dz 30 
BO + u) " ( y) (30) 


W(q — 
_W “a Peeve [eens w le + ade. 
Wwe'(a) 0 


Moreover, following (16), we have by applying Kendall’s identity and (20) 


1 
@(0 +u) 


CO CO 
/ e PO+WZ+E) PO" (2 4 a)dz = dre" e—" AM (a, r). 
0 0 


lee) 
ge Perms =| dre" ee F(x, r) 
0 


The claim (28) is justified using the above and (27) and by Tonelli and Laplace 
inversion of (30)—noting that both sides of (30) is right-continuous in r. oO 


From the above two propositions, we have following (24) and (18) that 


ae - 7 wa) 
Bye Nps cool} = ew [1 + HW C0 — y) — ww — y)| 
r 
- e "| 2 —y,r)y- uf 2G —y,t)dt (31) 
0 
W’(a _— y) r 
= — (u) 2 (u) 
w'(a) (A! (a,r) vf Ag’ (a, nat) | 
Ww (a _ y) 


+e" (2 (a —y,r)- A (a,r)) ta—e{e“* Lee <oo} }- 


W)'(a) 
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We arrive at our claim (13) once the expectation on the right hand side is found. 
For this purpose, set y = a — € on both sides of the above equation to get 


= We) . < 
[He (20.9) - Tire @)| Sja-ele"” Ure <oo)} 


Ww) 
—ur (u) (a) 
ew UW - “eV G) 


We) (u) ane 
trary Ae anu f Ae (a,1)dt)]. 


: 
—__— we) — 2 e,r) + uf QME, t)dt 
0 


(32) 


However, on account of (22), we can rewrite the terms Qa, t) as follows 
. Zz 
206, = ett =f WO @=PIX, € da), 
0 


from which the Eq. (32) simplifies further after some calculations to 


Ww (e) 
WwW’ (a) 


We) 
W'(a) 


—~ fal [ we. WOO) 
[acy WO CIPIX: € dd + Fare! (a,1))], 


AMG, "| ila e{e MP Vice ons 


[| W (2) 2 PEX, € dz} + 
0 r 


Ww’ (a) 


wu) 
We'(a) Wes) 


€ 
-|/ W (2)< P(X, € dz} + A (a,r) + uf W ©) = 
0 r 


or equivalently, we obtain after dividing both sides of the equation by W“ (e), 


‘la e{e Wr Vice caay | 


Wu) 
We) _ Wm € Wz) z AL? (at) 
[Z2 - Wea) —fod i( I 0 We) 4P{X; € dz} + Wea) 


€ WM) z (a,r) 
(6 eS erix, € dy + 479) 


=I+u 
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Using this result and putting it back in the Eq. (31), we arrive at 


Ww 
: € Wl) (a) (u) 
elt By {e UT, Dereon} =1+uW (a = y) = uaa u (a _ y) 
: Ww“ (a—y) 
(u)( 4 = (u) 
+ vf dr( 2! (a —y,t) wa) Ae’ (a, 1) 


Wu) (u) (u)(- (u) 
Woe W’@ _ rr € WM) z Aé ‘(a,t) 
[ie WO — f5 dt( fs BE QzP(X, € dz} + 472) ] 


€ WH) z AL? (ar) 
(fs We) <P{X, € dz} + Wa) 


w“(a—y) 


x (2a —y,r) _ a OUCT 


AW (a,r)). 33) 


We now want to compute the limit as € | 0 of (33). In order to do this, recall by the 
spatial homogeneity that P\y{ts < t} = Pjy+e{t, < t} and therefore by the right- 
continuity of the map y > P\y{t, < ft}, we have P\y{t, < t} = limejo Piy{t; < ft}. 
Hence, by weak convergence theorem the Laplace transform of Pjy{t < 1} 
converges as € | 0 to that of Pjy{t, < ft}, ie., limejo Bly {e—“™ Ure <oo}} = 
Dy {e-"* Ux,-<o0}} 

We consider two cases: W“) (0+) > 0 (X has paths of bounded variation) and 
Ww“ (0+) = 0 (X has unbounded variation). For the case W“) (0+) > 0, 


: . —-(u) Ww (a) 
et! aly {e UT, Ae cco | = 1+uw @-») Wag VP a-y») 
r W“(a—y) 
(U)(, _ _ (u) 
+ uf dt(2 6-9.) - ag 4% 1)) 
(34) 
( Wa) [ AM (a, t) dr) (2% ) WY(a- Y) AW )) 
— u{ —————__ _—_——_ a=) 6) = —S a,r ’ 
AO Gr | Jo AM Gr) 4 We'(a) 


which after some further calculations simplifies to the main result (13). 
For the case W“ (0+) = 0, we have after applying integration by parts that 


€ WY (z) z €z € 
—P{X dz\= —-P{X dz}\— d 
Weer [ Frere af : 


W)'(z) Zz 
WME) Jo 


“P(X, € dw}. 
7 


Therefore, by employing |’ H6pital rule we obtain 


«Ww Woe) f° 2P{X, € dz 
lim W*®) Zpry, € dz} =lim WW" Jo FPIX, € dz} = 
lO Jo WM CE)r €J0 w'(e) 
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wre Wwe) _ 
We) W'(e) _ 
This turns out to be the case when X has paths of unbounded variation since 
W“'(0+) = 2/0? if o # 0 and is equal to 00 if o = 0. See for instance 
Lemma 4.4. in Kyprianou and Surya [13]. On account of these results, we arrive 
at the identity (34), which after some further calculations simplifies to the main 
result (13). oO 
We have shown that (13) holds for z < a. We now prove that (13) holds for 
z > a. For this purpose, recall that under measure P\,, with y > a, t/ = Oa.s. On 
account of the fact W(x) = 0 for x < 0, we have from (25) and (28) fore = 0 


and y > a, 


The claim is established once we show that lime )o = limeyo 


; 
Py ter <7} = 2 @~y,7)— uf Qa — y, Hat. (35) 
0 


dy {e “ta 1, = e 2H (a =o r). (36) 


wer} 


These identities can be proved by Kendall’s identity, Tonelli, (16) and Laplace 
inversion taking account for x < 0, ee e~F’ QM (x, thdt = e? O* /(6 —u), 0 > u. 
Indeed, 


CO CO CO z 
/ ex, ndr= f ee f WOE +x)—PIX, € dz}dt 
0 0 0 


lee) lo) 
=H ee f W (z+ x)P{T,* € dt}dz 
0 0 


CO [o.@) 
-| wera | e 'P(T+ € dt}dz 
0 0 z 
lo.) 
-| WY (zg + xe PO dz 
0 


[o,2) 
= oF) / e POW (Z)dz. no 
x 


Starting from Eq. (23) with € = 0, we obtain following identities (35)—-(36), 


: 
Biy{e “Ute, coh } = ell 4 uf QM @—y7, nd =a” a= % "| 
0 


+ e "QM q —y, r) Bia {e Hr 16. 20a} | 


(37) 
The expression for Bia {e uty 1x, <00} } is given by setting z = a in (13): 
Ww (a) * AM (at) 
nu uTy — a Folge ik arse tite a da 
Na {e 1x, <00} } =1 Gm A [ Want) (38) 
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where we have used in the calculation above the fact that 2M, t) = e”', see 
Eq. (22). By inserting (38) in (37) we obtain after some further calculations that 


UT, —ur ae (a—y,r) (u) 
E\y{e 1x, <co} } =f {1 + ul - A (a,r) (a) 
: QH (a—y,r) 
Wq — = : (u) 
+ [ (2M @-»9- SE? 1a.) ar}, 
which corresponds to (13) for z > a showing that (13) holds for any z > 0. oO 


4.2 Proof of Proposition 1 


Applying Esscher transform of measure (4) to the result (13), we have 


Uut-+vX 7, 1, 


by,x{e 7,-<oo} } =e 


vx Dy.xfe el eS te 


=e" EY fe? Uz, <oop}, (39) 


where we have defined p = u — w(v). Under the new measure P”, 


(p) 
RY Ptr _ 4-pr 202) _ = Qy (a+tx—y,r) (p) 
ae 1fz, <co}} =e {1 + pW, (a+x—y) AP (a, r) WwW,’ (a) 
r (p) 
Ry —y, 
+f (2? @tx—y,n-2 EEE pia. y)ar]}, 
0 AY (a,r) 

following which and the Eq. (39) our claim in (14) is established. oO 


5 Conclusions 


We have presented some new results concerning Parisian ruin problem under Lévy 
insurance risk process, where ruin is announced when the risk process has gone 
below a certain level from the last record maximum of the process, also known 
as the drawdown, for a fixed consecutive period of time. They further extend the 
existing results on Parisian ruin below a fixed level of the risk process. Using recent 
developments on fluctuation and excursion theory of the drawdown of the Lévy risk 
process, the law of ruin-time and the position at ruin was given in terms of their 
joint Laplace transforms. Identities are presented semi-explicitly in terms of the 
scale function and the law of the Lévy process. The results can be used to calculate 
some quantities of interest in finance and insurance as discussed in the introduction. 
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Simple Group Actions on Arc-Transitive M®) 
Graphs with Prescribed Transitive Local cats 
Action 


Marston Conder 


Abstract This paper gives a partial answer to a question asked by Pierre-Emmanuel 
Caprace at the Groups St Andrews conference at Birmingham (UK) in August 2017, 
and investigated at the “Tutte Centenary Retreat’ workshop held at MATRIX in 
November 2017. Caprace asked if there exists a 2-transitive permutation group P 
such that only finitely many simple groups act arc-transitively on a connected graph 
X with local action P (of the stabiliser of a vertex v on the neighbourhood of v). 
Some evidence is given to suggest that the answer is “No”, even when ‘2-transitive’ 
is replaced by ‘transitive’, and then by way of illustration, a follow-up question 
is answered by showing that all but finitely many alternating groups have such an 
action on a 6-valent connected graph with vertex-stabiliser Ag. 


1 Introduction 


At the Groups St Andrews conference held at Birmingham (UK) in August 2017, 
Pierre-Emmanuel Caprace asked if there exists a 2-transitive permutation group P 
such that only finitely many simple groups act arc-transitively on a connected graph 
X in such a way that the stabiliser (in the simple group) of a vertex v induces P on 
the neighbourhood of v. This question and a follow-up question about what happens 
when P is the alternating group Ag were conveyed by Gabriel Verret and Michael 
Giudici at the ‘Tutte Centenary Retreat’ workshop held at MATRIX in November 
2017. What follows is a partial answer to the main question, showing that even 
when ‘2-transitive’ is replaced by ‘transitive’, no such group P can exist if a certain 
conjecture about alternating quotients of amalgamated free products is valid, and 
then a full answer to the sub-question, showing that P cannot be Ag6, as well as 
noting that P cannot be one of a number of other permutation groups. 
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2 The General Question 


One approach that can be taken to the general question is to consider the action of 
a group G on a graph X with the property that the stabiliser V in G of a vertex v 
of X is isomorphic to P (and induces P on the neighbourhood X (v)). First, let d 
be the degree of P (as a transitive permutation group). Then we may suppose that 
d > 3, since the automorphism groups of 2-valent connected arc-transitive graphs 
are dihedral and therefore soluble, and the question by Caprace is not relevant. 

Now observe that if EF is the stabiliser of an edge e = {v, w} incident with v, and 
A is the stabiliser of the arc (v, w) (and hence isomorphic to a point-stabiliser in the 
given group P), then G is a homomorphic image of the free product V *4 EF with 
the subgroup A = V M E amalgamated. Moreover, A has index d in V, and 2 in E. 

Next, a conjecture made by Dzambi¢ and Jones [10] and supported by the author 
asserts that if V and E are any two finite groups with a common subgroup A with 
index |V : A| > 3 and index |E : A| > 2, then all but finitely many alternating 
groups A, occur as homomorphic images of the amalgamated free product V *,4 E. 
An even stronger version of this conjecture (believed to be true by the author) is as 
follows: 


Conjecture 1 Let V and E be any two finite groups with a common subgroup A 
with index |V : A| > 3 and index |E : A| > 2, and let K be the core of A in 
V «4 E. Then all but finitely many A, occur as the image of the amalgamated free 
product V «4 E under some homomorphism that takes V and E to subgroups (of 
An) isomorphic to V/K and E/K respectively. In particular, if the amalgamated 
subgroup A is core-free in V «4 E, then all but finitely many A, occur as images of 
V «4 E under homomorphisms that are faithful on each of V and E. 


It is easy to see this is stronger than the conjecture in [10], since for example any 
quotient of Cz *c, C3 = C2 * C3 (the modular group) is also a quotient of C4*c, C6, 
but not vice versa. Also there is plenty of evidence in support of it. Indeed it is known 
to be true in many special cases (proved well before the original conjecture was 
made in [10]), such as those arising in the way described above from the study of 
finite arc-transitive and/or path-transitive 3-valent graphs [4, 8], or 7-arc-transitive 
4-valent graphs [9], or similarly from the study of arc-transitive digraphs [6], chiral 
maps [2] or chiral polytopes [5], and even hyperbolic 3-manifolds [7]. 

Furthermore, if the above conjecture is valid, then the answer to Caprace’s main 
question can be shown to be ‘No’, even when V is not 2-transitive: 


Theorem 1 /f Conjecture 1 is valid, then for every transitive finite permutation 
group P, all but finitely many alternating groups A, act arc-transitively on a 
connected graph X in such a way that the stabiliser in A, of a vertex v induces 
P on the neighbourhood of v. 


Proof Let V, E and A be as above, with F chosen as a group containing an index 2 
subgroup isomorphic to A, and consider the amalgamated free product V *4 E. Note 
that because A is a point-stabiliser in the permutation group V (= P), it is core-free 
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in V and hence also A is core-free in V +4 E. Suppose further that 0: V*4 E > G 
is any epimorphism to a finite non-abelian simple group G such that 6 is faithful on 
each of V and E, and also let a be any element of the image of E \ A in G, and let 
H be the 0-image of V (so that H is isomorphic to P). 

Now let X be the double-coset graph X(G, H, a), with vertices defined as the 
right cosets of H in G, with cosets Hx and Hy adjacent in X if and only if xy~! € 
HaH. This is a well-known construction attributed to Sabidussi, and described in 
detail in [4, 9] for example. The construction ensures that G acts as an arc-transitive 
group of automorphisms of the graph X (by right multiplication of cosets Hx in G), 
with vertex-stabiliser H acting transitively on the neighbourhood X (H) = {Hah : 
h € #} of the trivial coset H. Moreover, this action of H is equivalent to the action 
of V on cosets of A (by right multiplication), and hence the same as the natural 
action of the given permutation group P, as required. 

Finally, if Conjecture | is valid then we can take G as the alternating group A, 
for all but finitely many n, and this completes the proof. Oo 


3 Some Specific Cases 


The same argument as used in the above proof can be applied to many specific cases 
where Conjecture | is known to be valid. 

For example, this is often known to happen when the amalgamated subgroup A 
in V «4 E is trivial. The validity of Conjecture | for the free products C3 * C2 and 
Cx * C2 for all k > 7 follows from the fact that all but finitely many alternating 
groups are quotients of the ordinary (2,3, k) triangle group for any given such k 
(see [3]), and the same holds for Cy * Cz for all k € {4,5, 6} by the analogous 
properties of the (2, k, m) triangle groups for 4 < k < m (see [11)). 

Similarly, the fact that all but finitely many alternating groups are quotients of the 
extended (2, 3, k) triangle (see [3]) shows that the same thing holds for Dx *c, V4 
for k = 3 and all k > 7. Also in [10] it was shown that infinitely many alternating 
groups occur as quotients of A5 *c, Ds. 

Hence, in particular, the answer to Caprace’s question is ‘No’ when P is a cyclic 
or dihedral group of degree 3 or more, or the group of degree 12 induced by As on 
cosets of a subgroup of order 5. It is fairly clear that the same answer holds for many 
other permutation groups besides these, and we complete this paper (and answer the 
sub-question mentioned earlier) by considering the case where P = Ag. 

From now on we take V as A¢ and A as its point-stabiliser A5, and we choose 
E as As x C2. Just as before, note that A is core-free in V and hence also core-free 
in V *4 E. We will show that all but finitely many alternating groups A, occur as 
images of V «4 E under homomorphisms that are faithful on each of V and E. 
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To do this, first we note that V *4 E = Ag *4; (A5 x C2) is generated by three 
elements x, y and a with the following properties: 


* x andy generate V = Ag and satisfy the relations x* = y> = (xy)? = (xy*)* = 1, 
* yandu = xy !xyx generate A = As (and satisfy u2 = y> = (uy?)? = 1), and 
¢« y,u anda generate E = As x C2, and satisfy a* = [u,a] = 1 and y= al 


These properties may be seen by taking x = (3, 6)(4,5) and y = (1, 2,3,4,5) 
in Ao, with u = (1,4)(3,5), and by viewing a as the inner automorphism of As5 
induced by conjugation by (1, 3)(4, 5). 

Next, we consider six particular transitive permutation representations of V «4 E, 
of degrees 1, 12, 42, 62, 21 and 31, as given below. In each case we give also the 
permutations induced by u = xy~!xyx and w = xa, and identify the fixed points of 
the subgroup E (generated by y, u and a), and call these fixed points ‘link points’, 
for reasons that should soon become clear. 


Representation R, (degree 1) 


xt), 
yr 0, 
utr), 
a +> (), 
wr (). 
Link point 1 


Representation Rz (degree 12) 


x (1, 2)(3, 4), 12)(10, 11), 

y & (2, 3,4, 5, 6)(7, 8, 9, 10, 11), 

ur (2,4), 5)(7, 10), 11), 

a +> (2,7), 11), 10)(5, 9), 8), 

wr (1,7, 2)(3, 10)(4, 11)(5, 9, 12) (6, 8). 
Link points 1 and 12 


Representation R3 (degree 42) 


x F C1, 2)(3, 4 (7, 12)(8, 20)(9, 32) (10, 27) (11, 24) (13, 29) (14, 33) (15, 18) 
(16, 31)(17, 25) (21, 34)(23, 36) (26, 28) (30, 35) (39, 42) (40, 41), 

y & (2,3,4,5, 6)(7, 8, 9, 10, 11)(12, 13, 14, 15, 16)(17, 18, 19, 20, 21) 
(22, 23, 24, 25, 26) (27, 28, 29, 30, 31)(32, 33, 34, 35, 36) 
(37, 38, 39, 40, 41), 

ur (2,4), 5)(7, 10), 11)(12, 17) (13, 22) (14, 27)(15, 28) (16, 23) (18, 21) 
(19, 31)(20, 29) (24, 26)(25, 30) (32, 34) (33, 35) (37, 40) (39, 41), 

a (2,7)@, 11), 10)(5, 9)(6, 8)(13, 16)(14, 15) (18, 21) (19, 20) (22, 23) 
(24, 26) (27, 28) (29, 31) (32, 37) (33, 41) (34, 40) (35, 39) (36, 38), 

wb (1,7, 12,2), 10, 28, 24)(4, 11, 26, 27) (5, 9, 37, 32)(6, 8, 19, 20) 
(13, 31) (14, 41, 34, 18)(15, 21, 40, 33)(16, 29) (17, 25) (22, 23, 38, 36) 
(30, 39, 42, 35). 
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Link points 1 and 42 
Representation R4 (degree 62) 


x (1, 2)(3, 4 (7, 12)(8, 20)(10, 14)(11, 21) 13, 16)(15, 18) (23, 32) (24, 31) 
(25, 35) (26, 40) (27, 41) (28, 33) (29, 39) (36, 38) (42, 52)(43, 50) (44, 46) 
(45, 54) (47, 55) (48, 53)(57, 61) (60, 62), 

y (2,3, 4,5, 6)(7, 8, 9, 10, 11)(12, 13, 14, 15, 16)(17, 18, 19, 20, 21) 
(22, 23, 24, 25, 26) (27, 28, 29, 30, 31)(32, 33, 34, 35, 36) 
(37, 38, 39, 40, 41) (42, 43, 44, 45, 46)(47, 48, 49, 50, 51) 
(52, 53, 54, 55, 56) (57, 58, 59, 60, 61), 

ur (2,4)G,5)(7, 10), 11)(12, 17)(13, 19)(16, 20)(18, 21) (22, 27) (23, 29) 
(26, 30) (28, 31) (33, 37) (34, 39) (35, 41) (38, 40) (44, 47) (45, 49) (46, 51) 
(48, 50) (52, 54) (53, 56)(57, 60) (58, 61), 

a +> (2,7), 11), 10)(5, 9), 8) (12, 22) (13, 26)(14, 25) (15, 24) (16, 23) 
(17, 27) (18, 31) (19, 30) (20, 29) (21, 28) (32, 42)(33, 46) (34, 45) (35, 44) 
(36, 43) (37, 51) (38, 50)(39, 49) (40, 48)(41, 47) (52, 57) (53, 61) (54, 60) 
(55, 59) (56, 58), 

wb (1,7, 22, 12, 2)(3, 10, 25, 44, 33, 21)(4, 11, 28, 46, 35, 14)(5, 9) 
(6, 8, 29, 49, 39, 20)(13, 23, 42, 57, 53, 40)(15, 31) 
(16, 26, 48, 61, 52, 32)(17, 27, 47, 59, 55, 41) (18, 24) (19, 30) 
(34, 45, 60, 62, 54) (36, 50) (37, 51) (38, 43) (56, 58). 

Link points | and 62 


Representation Rs (degree 21) 


x (1, 2), 4)(7, 12)(8, 20)(10, 14)(11, 21) C13, 16)C15, 18), 

y  (2,3,4,5, 6)(7, 8, 9, 10, 11)(12, 13, 14, 15, 16)(17, 18, 19, 20, 21), 
ur (2,4), 5)(7, 10)Q, 11)12, 17)(13, 19)(16, 20)(18, 21), 

a (2, 7)(3, 11), 10)(5, 9), 8)(13, 16)(14, 15) (18, 21) (19, 20), 
wr (1,7, 12,2), 10, 15, 21)(4, 11, 18, 14)(5, 9) (6, 8, 19, 20). 

Link point 1 


Representation R¢ (degree 31) 


x F (1, 2), 4)(7, 12)(8, 20)(10, 14)(11, 21) 13, 16)C15, 18) (23, 25) (24, 31) 
(26, 28) (27, 29), 

y  (2,3,4,5, 6)(7, 8,9, 10, 11)(12, 13, 14, 15, 16)(17, 18, 19, 20, 21) 
(22, 23, 24, 25, 26) (27, 28, 29, 30, 31), 

ur (2,4), 5)(7, 10), 11)(12, 17) (13, 19)(16, 20)(18, 21) (22, 27) (23, 29) 
(26, 30) (28, 31), 

a +> (2,7), 11), 10), 9)(, 8)(12, 22) (13, 26) (14, 25) (15, 24) (16, 23) 
(17, 27) (18, 31) (19, 30) (20, 29) (21, 28), 
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wb (1,7, 22, 12, 2)(3, 10, 25, 16, 26, 21)(4, 11, 28, 13, 23, 14)(5, 9) 
8, 29, 17, 27, 20) (15, 31) (18, 24) (19, 30). 
Link point 1 


Note that in each case, the permutations induced by x, y and w are necessarily 
even, since they generate a subgroup isomorphic to Ag or the trivial group. On 
the other hand, the permutations induced by the involution a have 0, 5, 18, 30, 9 
and 15 transpositions respectively, and hence the permutations induced by a and 
w = xa are even in representations R;, R3 and Ra, but are odd in representations 
Ro, Rs5 and Ro. Indeed, the cycle structure of the permutation induced by w = 
xa in representations R2 to R6 is os Am a8 5? 6°, 1°32! 4* and 2*5! 6°, 
respectively. 

We will use these six representations as ‘building blocks’ for constructing 
transitive permutation representations of V «4 E of arbitrarily large degree, by using 
the link points to join representations together. 

To help to explain that, we observe how the image of each representation of 
V «4 E splits into orbits of the subgroups V = (x,y) = Ao, E = (y,u,a) = 
As x Cy and A = VO E = (y,u) = As. For example, the image of R3 (of 
degree 42) splits into three orbits of V, of lengths 6, 6 and 30, namely {1, 2, ..., 6}, 
{7,8,...,36} and {37,38,...,42}, and these in turn split into seven orbits of 
A, of lengths 1, 5, 5, 20, 5, 5 and 1, namely {1}, {2,3,...,6}, {7,8,..., 11}, 
{12, 13,..., 31}, {32, 33,..., 36}, (37, 38,...,41} and {42}. Every orbit of the 
subgroup E = (A,a) is then either an orbit of A preserved by a, or a union 
of two orbits of A that are interchanged by a. For example (again), in R3 the 
subgroup E has five orbits, of lengths 1, 10, 20, 10 and 1, namely {1}, {2,3,..., 11}, 
{12, 13,..., 31}, (32, 33,..., 41} and {42}. 

This orbit decomposition is depicted for all six of our “building block’ repre- 
sentations in Fig. 1, with each small box indicating an orbit of A (and the number 
inside it indicating the length of that orbit), and each thin horizontal line indicating 
a connection between a pair of orbits of A that are interchanged by a. In particular, 
each small box with a ‘1’ inside it contains a link point, fixed by E = (y, u,a). 

Next, if we take any two transitive permutation representations of V «4 E, say 
of degrees n; and 2, such that each representation contains at least one link point, 
then we can join them together to form a larger one of degree n; + n2, by simply 
concatenating the permutations induced by each of x, y and a, and then adding a 
transposition to a that swaps the two chosen link points. 

For example, we can join the first two representations together by re-labelling 
the single point of R; as ‘13’, and then adding a new transposition (12, 13) to the 
permutation induced by a. This gives a transitive representation on 13 points, in 
which x, y and u induce the same permutations as given in R2, while a induces 
the involution (2, 7)(3, 11)(4, 10)(5, 9)(6, 8)(12, 13) and w = xa induces the 
permutation (1, 7, 2)(3, 10)(4, 11)6, 9, 13, 12)(, 8). 

Here, and in general when a pair of transitive representations are joined together 
in this way, the images of x, y and a still satisfy the same relations as in V *,4 E, 
and hence (by the universal property of amalgamated products), the definition of the 
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Fig. 1 Our ‘building block’ representations 1 to 6 (on 1, 12, 42, 62, 21 and 31 points respectively) 


images extends to a new permutation representation of V *4 E. The only significant 
change is made to the permutation induced by a, and this simply joins two single- 
point orbits of E = (y, u, a) into a single two-point orbit of E’. Similarly, the cycles 
of w = xa containing the two link points are merged into a single cycle. 

For another example, suppose we join together a copy of each of Rs and R2 by 
adding a new transposition that swaps the link point | of Rs with link point 1 of 
Rp» (suitably re-labelled). Then we obtain a transitive permutation representation on 
21+ 12 = 33 points. Before the join, the permutations induced by w have cycle 
structures 1° 2! 44 and 23 32, with link point 1 of Rs lying in a cycle of length 4 and 
link point 12 of Ro lying in a cycle of length 3. The effect of the join is to merge 
those two cycles into a single cycle of length 7, leaving other cycles unchanged. 

We have now dealt with enough properties of the building blocks and their 
conjunction to prove the following: 


Theorem 2 For all but finitely many positive integers n, both the alternating group 
An and the symmetric group Sp are homomorphic images of the amalgamated free 
product Ag *As (A5 Xx C2), and hence act faithfully as an arc-transitive group of 
automorphisms of some 6-valent graph with vertex-stabiliser isomorphic to Ag. 


Proof For any positive integers k and m, letn = 21 + 12k + 62m, and observe that 
every odd positive integer n > 395 is expressible in this way. 

Now construct a transitive permutation representation of Ag *4; (As5 x C2) of odd 
degree n by stringing together a single copy of Rs with k copies of R2, and then m 
copies of R4. Then the permutation induced by a has 9 + 6k + 31m transpositions 
(with k +m of these coming from the linkages), and so the permutations induced by 
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a and w are even when m is odd, but odd when m is even. Indeed, the permutation 
induced by w has cycle structure 13 2!+34+8 43 51 6k—1+6m 71 gl pq! 

The single 7-cycle comes from the linkage between the copy of Rs and the first 
copy of R2. Also the length of every other cycle of w divides 120, so w!?° is a 
single 7-cycle. Moreover, this 7-cycle contains a pair of points interchanged by x, 
a fixed point of y, and a pair of points interchanged by a. It follows that the image 
of this new representation is primitive (for otherwise there would be a block B of 
imprimitivity containing all 7 points of the 7-cycle, but then B would be preserved 
by each of x, y and a and hence by the whole group). And now by a theorem of 
Jordan [12, Theorem 13.9], this 7-cycle ensures that the permutations generate Ay, 
for large n = 3 mod 4 when m is odd, and S, for large n = 1 mod 4 when m is 
even. 

Next, we can add a copy of Rj to the final copy of R4, and get a transitive 
representation of Ag *4; (A5 x C2) of even degree n = 21 + 12k+62m + 1, and the 
same argument works, except that the parity of the permutations a and w changes, 
with a 5-cycle of w = xa becoming another 6-cycle. In this case the permutations 
generate S, with n = 0 mod 4 when m is odd, and A, with n = 2 mod 4 when m is 
even. 

Finally, we can replace the single copy of R5 by a copy of Re¢, and insert a single 
copy of R3 between the k copies of Rz and the m copies of R4, and get transitive 
permutation representations of odd degree n = 31 + 12k + 42 + 62m and even 
degree n = 31 + 12k + 42+ 62m + 1, in which the permutations induced by a and 
w are even if and only if m is even in the first case, and are even if and only if m is 
odd in the second case. The cycle structure of xa is altered by addition of a 9-cycle, 
plus some changes in the 1-, 2-, 4-, 6- and 8-cycles, and replacement of the single 
7-cycle by a new single 7-cycle coming from the linkage between R» and R3, but 
the same arguments apply as earlier. In this case the induced permutations generate 
Ay for large n = 1 mod 4 and S, for large n = 3 mod 4 when m is even, and S, for 
large n = 2 mod 4 and A,, for large n = 0 mod 4 when m is odd. 

These constructions cover all residue classes mod 4 for the degree n, for both Ay, 
and S,, for large enough n (indeed for all n > 447), as required. oO 


Incidentally, we also obtain the following, because if a is any involution in E \ A, 
then the index 2 subgroup S = (V, V“) in the group V *4 E = Ag *4, (As Xx 
C2) used above is isomorphic to Ag *4; Ao, and also maps onto A, for large n. 
This strengthens an observation made by Peter Neumann and Cheryl] Praeger at 
the Groups St Andrews conference that Ag *4, A6 has infinitely many alternating 
quotients. 


Corollary 1 All but finitely many alternating groups occur as quotients of the 
amalgamated free product Ag *4; Ao. 
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Biquasiprimitive Oriented Graphs ®) 
of Valency Four od 


Nemanja Poznanovi¢é and Cheryl E. Praeger 


Abstract In this short note we describe a recently initiated research programme 
aiming to use a normal quotient reduction to analyse finite connected, oriented 
graphs of valency 4, admitting a vertex- and edge-transitive group of automorphisms 
which preserves the edge orientation. In the first article on this topic (Al-bar et 
al. Electr J Combin 23, 2016), a subfamily of these graphs was identified as 
‘basic’ in the sense that all graphs in this family are normal covers of at least 
one ‘basic’ member. These basic members can be further divided into three types: 
quasiprimitive, biquasiprimitive and cycle type. The first and third of these types 
have been analysed in some detail. Recently, we have begun an analysis of the basic 
graphs of biquasiprimitive type. We describe our approach and mention some early 
results. This work is on-going. It began at the Tutte Memorial MATRIX Workshop. 


A graph I” is said to be G-oriented for some subgroup G < Aut(JI"), if G acts 
transitively on the vertices and edges of I”, and I” admits a G-invariant orientation of 
its edges. Any graph I” admitting a group of automorphisms which acts transitively 
on its vertices and edges but which does not act transitively on its arcs can be viewed 
as a G-oriented graph. These graphs are usually said to be G-half-arc-transitive, and 
form a well-studied class of vertex-transitive graphs. 

A G-oriented graph I” necessarily has even valency, with exactly half of the 
edges incident to a vertex a being oriented away from @ to one of its neighbours. 
Since such a graph is vertex-transitive, it follows that all of its connected compo- 
nents are isomorphic, and each connected component is itself a G-oriented graph. 
We therefore restrict our attention to the study of connected G-oriented graphs. 
For each even integer m, we let OY(m) denote the family of all graph-group pairs 
(I, G) such that I” is a finite connected G-oriented graph of valency m. 
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Graphs contained in OY (2) are simply oriented cycles. The family (Y(4) on 
the other hand, has been studied for several decades now, see for instance [4-6]. 
These papers suggest a framework for describing the structure of graphs I” for pairs 
(I, G) in the class @Y(4) by considering various types of quotients based on the 
structure of certain kinds of cycles of I”. In this new approach we study quotients 
based on normal subgroups of the group G. 


Normal Quotients Given a pair (I, G) € GY(4) and a normal subgroup N of G, 
we define a new graph Iv as follows: the vertex set of Iv consists of all N-orbits 
on the vertices of J”, and there is an edge between two N-orbits {B, C} in I'y if 
and only if there is an edge of the form {a, 6} in I’, witha € B and B € C. The 
graph Iy is called a G-normal-quotient of I". The group G induces a subgroup Gy 
of automorphisms of iy, namely Gy = G/K for some normal subgroup K of G 
such that NV < K. The K-orbits are the same as the N-orbits so [x = Iy, although 
sometimes K may be strictly larger than NV. 

In general, the pair (iy, Gy) need not lie in GY (4). For instance, if the normal 
subgroup WN is transitive on the vertex set of I”, then [yy consists of just a single 
vertex. If the graph I is bipartite and the two N-orbits form the bipartition, then 
Ivy will be isomorphic to the complete graph on two vertices K2. In other cases the 
quotient graph ly may also be a cycle graph C,, for r > 3. These three types of 
quotients are defined to be degenerate in the sense that in each of these cases Iv 
does not have valency 4 and so (Ivy, Gy) ¢ GY(A4). It turns out that these cases are 
the only obstacles to the pair (iy, Gy) lying in GY(4). 


Theorem 1 ({1] Theorem 1.1) Let (7, G) € @Y(A) with vertex set X, and let N 
be anormal subgroup of G. Then G induces a permutation group Gy on the set of 
N-orbits in X, and either 


(i) (I'v, Gy) is also in OG(A4), I is a G-normal cover of 'y, N is semiregular on 
vertices, and Gy = G/N; or 

(ii) (Uw, Gy) is a degenerate pair, (i.e. I'y is isomorphic to K,, Kz or C;, for some 
r > 3). 


This leads to a framework for studying the family OY(4) using normal quotient 
reduction. The first goal of this approach is to develop a theory to describe the ‘basic’ 
pairs in OY(4). A graph-group pair (I, G) € OY (4) is said to be basic if all of its 
G-normal quotients relative to non-trivial normal subgroups are degenerate. Since 
Theorem | ensures that every member of (Y(4) is a normal cover of a basic pair, 
the second aim of this framework is to develop a theory to describe the G-normal 
covers of these basic pairs. This approach has been successfully used in the study 
of other families of graphs with prescribed symmetry properties, see for instance 
[7-9]. 

The basic pairs may be further divided into three types. A pair (7, G) € OF (4) 
is said to be basic of quasiprimitive type if all G-normal quotients ~y of I” are 
isomorphic to K,. This occurs precisely when all non-trivial normal subgroups 
of G are transitive on the vertices of I”. (Such a permutation group is said to be 
quasiprimitive. ) 
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If the only normal quotients of a basic pair (", G) € OY(A4) are the graphs K1 
or Kz, and I has at least one G-normal quotient isomorphic to K2, then (J’, G) is 
said to be basic of biquasiprimitive type. (The group G here is biquasiprimitive: it is 
not quasiprimitive but each nontrivial normal subgroup has at most two orbits.) The 
other basic pairs in 0Y(4) must have at least one normal quotient isomorphic to a 
cycle graph C,, and these basic pairs are said to be of cycle type. 

The basic pairs of quasiprimitive type have been analysed in [1], and further 
analysis was conducted on basic pairs of cycle type in [2] and [3]. Although more 
remains to be done to describe the structure of basic pairs of cycle type, the main 
focus of our work is the biquasiprimitive case. 


Basic Pairs of Biquasiprimitive Type: Early Results Our current work aims to 
develop a theory to describe the basic pairs of biquasiprimitive type. Following the 
work done in [1] describing quasiprimitive basic pairs, we aim to produce similar 
structural results and constructions for the biquasiprimitive case. In [10] there is a 
group theoretic tool available for studying finite biquasiprimitive groups analogous 
to the O’Nan-Scott Theorem for finite primitive and quasiprimitive permutation 
groups. We outline our general approach below, though this work is still in progress. 

Let I be a graph with vertex set X and suppose that (”, G) € OYA) is basic 
of biquasiprimitive type for some group G. Then there exists a normal subgroup N 
of G with exactly two orbits on X, and all normal subgroups of G have at most 
two orbits. It is easy to see that I is bipartite: since J” is connected there is an 
edge joining vertices in different N-orbits, and since G normalises N and is edge- 
transitive, each edge joins vertices in different N-orbits. Thus the two orbits of N 
form a bipartition of I’. 

Let {A, A’} denote the bipartition of the vertices of I”, and let G* be the index 2 
subgroup of G fixing A (and A’) setwise. Since I" is G-vertex-transitive it follows 
that G* is transitive on both A and A’. As we just saw, any non-trivial intransitive 
normal subgroup N of G must have the sets A, A’ as its two orbits on X, and hence 
N < G7. It can also be shown that the action of Gt on A is faithful. 

Consider now a minimal normal subgroup M of G*. If M is also normal in G 
then the M-orbits on X are A and A’ and M is a minimal normal subgroup of G. 

On the other hand, if M is not normal in G, then for any element x € G\Gt, we 
see that M* is also a minimal normal subgroup of (G*)* = G*, and furthermore, 
M 4 M* since otherwise G would normalise M. It follows from the minimality of 
M that MM M* = 1 and hence that M x M” is contained in Gt and is normal in 
G. Letting N := M x M*, we see that the N-orbits on X are A and A’ since N is 
normal in G. 

In summary, we always have a normal subgroup N of G contained in Gt with 
A and A’ the N-orbits in X, and such that either 


(a) N is a minimal normal subgroup of Gt and N = T* for some simple group T 
and k > 1; or 

(b) N = M x M* where x € G\G*, and M = T° is a minimal normal subgroup 
of Gt with T a simple group and £ > 1. In particular, N = T* with k = 28. 
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For a vertex a € A, the vertex stabilisers Gg and Gt are equal and Gt = NGy. 
Moreover, since the vertex stabilisers of 4-valent G-oriented graphs are 2-groups, it 
follows that Gt/N = Gy/Ng is also a 2-group. 

Hence by analysing the minimal normal subgroups of G and G* as above, and 
considering the various possibilities for the direct factors T of N, we can reduce 
the possibilities for basic pairs of biquasiprimitive type to several cases. In fact, our 
main result so far uses combinatorial arguments to bound the values of £ and k in 
cases (a) and (b) above, though this is still a work in progress. 

We give an infinite family of examples of basic biquasiprimitive pairs. These 
graphs have order 2p*, with p prime, and Gt has an elementary abelian normal 
subgroup. There were no analogues of these examples in the basic quasiprimitive 
case since the minimal normal subgroups in that case are nonabelian, [1, Theo- 
rem 1.3]. 


Example I Let p be a prime such that p = 3 (mod 4), let A = {(x, y)o | x,y € 
Cp} and A’ = {(x, y)1 | x, y € Cp}, two copies of the additive group N = Ci. and 
let X = AU A’. Define 6 € Sym(X) by (x, y)? = (y, —X)1-e, for x, y € Cp and 
e € {0, 1}, and let G := N ™x (6). Note that 6 has order 4 and normalises N. Also 
fora = (0, 0)0, Ga = Nw = (67) = Co. 

Define the G-oriented graph I” to have vertex set X and, for each x,y € Cp, 
edges oriented from (x, y)o to (x, y + 1); and from (x + 1, y); to (x, y)o. 

Then I” has valency 4, G is vertex- and edge-transitive, and G preserves the edge 
orientation. Thus (I’, G) € GY(4). Also I’ is bipartite, and G is biquasiprimitive 
(verifying the latter property uses the fact that p = 3 (mod 4)). Hence (I, G) is 
basic of biquasiprimitive type. 


Our goal is to refine our restrictions on k and @ to such an extent that we can give 
constructions of families of examples for all possible values of these parameters. 
Example | defines the graphs as BiCayley graphs, and other BiCayley examples 
arise naturally in our context. However, in the case where GT has no normal 
subgroup which is regular on the two G* orbits A and A’, different constructions 
will be required. 

As noted above, these results develop the work initiated in [1] and developed 
further in [2, 3]. 
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The Contributions of W.T. Tutte M®) 
to Matroid Theory Greet 


Graham Farr and James Oxley 


Abstract Bill Tutte was born on May 14, 1917 in Newmarket, England. In 1935, he 
began studying at Trinity College, Cambridge reading natural sciences specializing 
in chemistry. After completing a master’s degree in chemistry in 1940, he was 
recruited to work at Bletchley Park as one of an elite group of codebreakers that 
included Alan Turing. While there, Tutte performed “one of the greatest intellectual 
feats of the Second World War.” Returning to Cambridge in 1945, he completed a 
Ph.D. in mathematics in 1948. Thereafter, he worked in Canada, first in Toronto and 
then as a founding member of the Department of Combinatorics and Optimization 
at the University of Waterloo. His contributions to graph theory alone mark him 
as arguably the twentieth century’s leading researcher in that subject. He also 
made groundbreaking contributions to matroid theory including proving the first 
excluded-minor theorems for matroids, one of which generalized Kuratowski’s 
Theorem. He extended Menger’s Theorem to matroids and laid the foundations for 
structural matroid theory. In addition, he introduced the Tutte polynomial for graphs 
and extended it and some relatives to matroids. This paper will highlight some of 
his many contributions focusing particularly on those to matroid theory. 


1 Introduction 


The task of summarizing Bill Tutte’s mathematical contributions in a short paper 
is an impossible one. There are too many, they are too deep, and their implications 
are too far-reaching. This paper will discuss certain of these contributions giving 
particular emphasis to his work in matroid theory and the way in which that work 
links to graph theory. The terminology used here will follow Oxley [16]. 
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This paper will attempt to give insight into the thoughts and motivations that 
guided Tutte’s mathematical endeavours. To do this, we shall quote extensively from 
three sources. Dan Younger, Tutte’s long-time colleague and friend at the University 
of Waterloo, wrote the paper William Thomas Tutte 14 May 1917-2 May 2002 [53] 
in the Biographical Memoirs of Fellows of the Royal Society, and that paper includes 
many quotes from Tutte that are reproduced here. In 1999, Tutte presented the 
Richard Rado Lecture The Coming of the Matroids at the British Combinatorial 
Conference in Canterbury. We will also quote from Tutte’s write-up of that lecture 
in the conference proceedings [47]. Finally, we draw on commentaries by Tutte on 
his own papers that appear in Selected Papers of WT. Tutte I, IT [45, 46], published 
in 1979 to mark Tutte’s 60th birthday. 

These Selected Papers were edited by D. McCarthy and R. G. Stanton. Ralph 
Stanton was a noted mathematician who had been the first Dean of Graduate 
Studies at the University of Waterloo and who recruited Tutte to Waterloo from 
Toronto in 1962. Stanton’s foreword to the Selected Papers provides a context for 
the magnitude of Tutte’s achievements: 


Not too many people are privileged to practically create a subject, but there have been 
several this century. Albert Einstein created Relativity ...Similarly, modern Statistics owes 
its existence to Sir Ronald Fisher’s exceptionally brilliant and creative work. And I think 
that Bill Tutte’s place in Graph Theory is exactly like that of Einstein in Relativity and that 
of Fisher in Statistics. He has been both a great creative artist and a great developer. 


Bill Tutte was born on May 14, 1917 in Newmarket, England. His family moved 
several times when he was young but they returned to the Newmarket area, to the 
village of Cheveley, when Bill was about seven. Bill attended the local school. In 
May, 1927 and again a year later, he won a scholarship to the Cambridge and County 
High School for Boys, some eighteen miles from his home. The first time he won, 
his parents judged that it was too far for their son to travel and he was kept home. 
A year later his parents permitted him to attend the school despite the long daily 
commute each way, by bike and by train [53, p. 287]. In the high school library, 
Bill came across Rouse Ball’s book Mathematical Recreations and Essays [1], first 
published in 1892. That book included discussions of chess-board recreations, map 
colouring problems, and unicursal problems (including Euler tours and Hamiltonian 
cycles). Some parts of his chemistry classes were [45, p. 1] pure graph theory and in 
his physics classes, he learned about electrical circuits and Kirchhoff’s Laws. Tutte 
wrote [45, p. 1], 


When I became an undergraduate at Trinity College, Cambridge, I already possessed much 
elementary graph-theoretical knowledge though I do not think I had this knowledge well- 
organized at the time. 


In 1935, Tutte began studying at Trinity College, Cambridge. He read natural 
sciences, specializing in chemistry. From the beginning, he attended lectures of the 
Trinity Mathematical Society. Three other members of that Society, all of whom 
were first-year mathematics students, were R. Leonard Brooks, Cedric A.B. Smith, 
and Arthur H. Stone. This group had various names [47, p. 4] including “The 
Important Members, The Four Horsemen, The Gang of Four.’ They became fast 
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friends spending many hours discussing mathematical problems. Tutte wrote [53, 
p. 288], 


As time went on, I yielded more and more to the seductions of Mathematics. 


Tutte’s first paper [22], in chemistry, was published in 1939 in the prestigious 
scientific journal Nature. His first mathematical paper, The dissection of rectangles 
into squares, was published with Brooks, Smith, and Stone [3] in 1940 in the Duke 
Mathematical Journal. Their motivating problem was to divide a square into a finite 
number of unequal squares. In 1939, Sprague [21] from Berlin published a solution 
to this problem just as The Four were in the final stages of preparing their paper 
in which, ingeniously, they converted the original problem into one for electrical 
networks. Writing later about The Four’s paper, Tutte said [45, p. 3], 


I value the paper not so much for its ostensible geometrical results, which Sprague largely 
anticipated, as for its graph-theoretical methods and observations. 


Tutte went on to note [45, p. 4] that, in this paper, 


two streams of graph theory from my early studies came together, Kirchhoff’s Laws from 
my Physics lessons, and planar graphs from Rouse Ball’s account of the Four Colour 
Problem.’ 


Tutte wrote a very readable account of this work in Martin Gardner’s Mathemat- 
ical Games column in Scientific American in November, 1958, and that account is 
now available online [32]. 

The Four’s paper is remarkable not only for its solution to the squaring-the- 
square problem and its beautiful graph-theoretic ideas, but also for the extent to 
which it contains the seeds of Tutte’s later work. In it, we find planarity, duality, 
flows, numbers of spanning trees, a deletion-contraction relation, symmetry, and 
above all the powerful application of linear algebra to graph theory. ! 

After completing his chemistry degree in 1938, Tutte worked as a postgraduate 
student in physical chemistry at Cambridge’s famous Cavendish Laboratory com- 
pleting a master’s degree in 1940. Tutte’s work in chemistry [53, p. 288] 


convinced him that he would not succeed as an experimenter. He asked his tutor, Patrick 
Duff, to arrange his transfer from natural sciences to mathematics. This transfer took place 
at the end of 1940. 


Tutte later wrote [47, p. 4], 


I left Cambridge in 1941 with the idea that graph theory could be reduced to abstract algebra 
but that it might not be the conventional kind of algebra. 


‘Incidentally, it may also be regarded as Tutte’s first paper on graph drawing. In that field, too, he 
is regarded as a pioneer, mostly because of his 1963 paper ‘How to draw a graph’ [35]. But it is still 
worth noting the graph-drawing aspect of his very first mathematics paper: the squared rectangles 
are a type of simultaneous “visibility drawing” of a planar graph and its dual. 
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Like so many of the brightest minds in Britain at the time, Tutte was recruited 
as a codebreaker and worked at the Bletchley Park Research Station—now famous, 
but then top secret—from 1941 till 1945. He wrote [47, p. 5], 


at Bletchley I was learning an odd new kind of linear algebra. 


Narrating the 2011 BBC documentary, Code-Breakers: Bletchley Park’s Lost 
Heroes [2], the actress Keeley Hawes says, 


This is Bletchley Park. In 1939, it became the wartime headquarters of MI6. If you know 
anything, about what happened here, it will be that a man named Alan Turing broke the 
German Naval code known as ‘Enigma’ and saved the nation; and he did. But that’s only 
half the story. 


Then Captain Jerry Roberts, who had been a Senior Cryptographer at the Park 
during the war, speaks: 
There were three heroes of Bletchley Park. The first was Alan Turing; the second was Bill 


Tutte, who broke the Tunny system, a quite amazing feat; and the third was Tommy Flowers 
who, with no guidelines, built the first computer ever. 


Tutte’s work at Bletchley Park was truly profound. The problem he faced was 
to break into communications encoded by an unknown cypher machine, codenamed 
Tunny by the British. (Its real name was Lorenz SZ40.) This machine was much 
more secure and complex than the famous Enigma machine, reflecting its use at 
the highest levels of the Nazi regime including by Hitler himself. Furthermore, 
although the British knew the architecture of the Enigma machine, the design of 
the Tunny machine was a complete mystery to them. The problem Tutte faced 
was thus far harder than the Enigma problem which Turing is justly celebrated 
for solving. Tutte’s first problem was the diagnosis of the Tunny machine (that is, 
determining how it worked), just from collected cyphertext; only then could he and 
his colleagues move on to cryptanalysis. Tutte made the crucial breakthrough in 
diagnosis, an astonishing achievement. He then went on to develop cryptanalysis 
algorithms. These were very computationally intensive. The Colossus cryptanalytic 
computers, designed by Tommy Flowers, were built to implement Tutte’s algorithms 
and performed service of incalculable value for the remainder of the war. 

The University of Waterloo’s magazine for Spring, 2015 has an article Keeping 
Secrets about this work in which one reads, 


According to Bletchley Park’s historians, General Dwight D. Eisenhower himself described 
Tutte’s work as one of the greatest intellectual feats of the Second World War. 


Some details of this work can be found in [7, 12, 53]. Tutte’s own account of 
these efforts appear in [47, 48]. 

As a consequence of Tutte’s top-secret code-breaking work at Bletchley Park, he 
was elected a Fellow of Trinity College in 1942. He wrote [53, p. 291] of this, 


It seemed to me that the election might be criticized as a breach of security, but no harm 
came of it. 
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In 1945, after the war, Tutte returned to Cambridge for a Ph.D. in mathematics, 
supervised by Shaun Wylie, with whom Tutte had worked at Bletchley. Tutte 
completed his Ph.D. thesis, An algebraic theory of graphs, in 1948 despite Wylie’s 
advice [53, p. 291] to 


drop graph theory and take up something respectable, such as differential equations. 


Tutte’s thesis, which was xi + 417 pages, was an extraordinary accomplishment 
with the ideas in it forming the basis for much of his work for the next two decades. 
We discuss it in more detail in Sect. 8. He wrote [53, p. 291] of his decision to stick 
with graph theory, 


If one assumes that graph theory was my métier, it was just as well that I had the prestige 
of a Fellow of Trinity. 


2 Tutte’s Doctoral Research 


Tutte’s first year of doctoral research was remarkably productive. He submitted six 
papers during the period from November 1945 to December 1946, including four 
that became classics of the field, though most of them bore no relation to his Ph.D. 
thesis. 

In the first of these classic papers [24], Tutte found a 46-vertex counterexample 
to an 1884 conjecture of Tait [23] that every cubic planar graph is Hamilitonian. 

Tutte’s paper A ring in graph theory is his first paper on the Tutte polynomial and 
one of his most profound. His polynomial is not given explicitly in any of its usual 
forms, and its presence is somewhat obscured by some technical details and the use 
of multivariate polynomials to develop much of the theory. But the main ingredients 
of Tutte-polynomial theory are all there. We return to it shortly, in Sect. 2.1. 

His third classic paper [27] studied symmetry in cubic graphs. An s-arc in a 
graph is a walk with s edges in which consecutive edges are always distinct. Apart 
from this constraint, vertices and edges may occur repeatedly. Note that a walk and 
its reverse are considered to be different. A graph G is s-arc-transitive if it has at 
least one s-arc and, for any two s-arcs, there is an automorphism of G that maps 
one to the other. This is a very strong symmetry property indeed. Tutte showed that 
there are no s-arc-transitive cubic graphs with s > 5, gave an inequality relating 
girth to s, and characterized graphs where the inequality comes as close to equality 
as possible for a given girth; these are the g-cages, a finite family of graphs, the most 
complex being his 8-cage. This paper became enormously influential in the theory 
of symmetric graphs. 

The fourth of these groundbreaking papers [25] proved the characterization of 
when a graph has a 1-factor, or perfect matching. This theorem is now a staple of 
most introductory courses on graph theory. 
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2.1 ‘A Ring in Graph Theory’ 


The starting point and driving principle of this paper is the observation that certain 
functions on graphs obey deletion-contraction relations. As an example of such a 
function, Tutte considered the complexity C(G) of a connected graph G, this being 
the number of spanning trees of G. When The Four were working on the problem of 
partitioning a rectangle into unequal squares, they observed that complexity obeys 
the following recursion. 


Lemma 1 Jn a graph G, let e be an edge that is neither a loop or a cut edge. Then 
C(G) = C(G\e) + C(G/e). 


Proof Partition the set of spanning trees of G into 


(i) those not using e; and 
(ii) those using e. 


There are C(G\e) spanning trees in (i); and the spanning trees in (ii) match up with 
the spanning trees of G/e. 


Tutte wrote [45, p. 51], 


I wondered if complexity, or tree number, could be characterized by the above identity alone 
and decided that it could not. 


His paper considered the following. 


Problem 1 What isomorphism-invariant functions W of graphs satisfy 
W(G) = W(G\e) + W(G/e) 


for all non-loop edges e of G? 


He called such a function, taking values in an abelian group, a W-function. A 
W-function is a V-function if 


W(G, U G2) = W(G1)W(G2) 


for all disjoint graphs G; and G2, where W now takes values in a commutative ring 
with unity. 

For a graph G, let P(G; A) denote the number of proper A-colourings of G. 
Tutte noted that (—1)!““! times P(G; A) is an example of a V-function. This is an 
immediate consequence of the following lemma, which was first proved by Foster, 
in “Note added in proof” in [51, p. 718]. 


Lemma 2 For anon-loop edge e of a graph G, 


P(G; A) = P(G\e; A) — P(G/e; 4). 


The Contributions of W.T. Tutte to Matroid Theory 349 


Proof Let e have distinct endpoints u and v. Partition the proper k-colourings of 
G\e into 


(i) those in which wu and v have different colours; and 
(ii) those in which u and v have the same colour. 


In (i), we are counting the number of proper k-colourings of G; while (ii) 
corresponds to the number of proper k-colourings of G/e. 


Tutte’s insightful breakthrough here was to focus on the two recursions: 


1. W(G) = W(G\e) + W(G/e); and 
2. W(G, U G2) = W(G1)W(G2). 


Many readers will recognize here the origins of the Tutte polynomial. There 
are technical differences between the multivariate polynomials in this paper and 
the more familiar polynomials of Whitney and Tutte, which we will define in 
Sect.3. But some simple adjustments—such as substitutions to give bivariate 
specializations, and dividing by x*© or (x — 1)*(©)—+reveal both the Whitney rank 
generating function and Tutte polynomial, albeit in period costume. The relationship 
between these two polynomials, which is just a coordinate translation of one step in 
each direction, is subsumed by a more general result in the paper. In fact, most of 
the main ingredients of Tutte-polynomial theory are here, with deletion-contraction 
relations at the core. The details of this paper are discussed in [9]. 


3 Graph Polynomials 


In 1954, Tutte published [29] A contribution to the theory of chromatic polynomials. 
By then, he was at the University of Toronto having been recruited there in 1948 
by H.S.M. Coxeter, another famous graduate of Trinity College, Cambridge. In 
this paper, Tutte introduced what he called the dichromate of a graph, this now 
being known as the Tutte polynomial of the graph. The dichromate is a two-variable 
polynomial not to be confused with another two-variable polynomial Tutte labelled 
the dichromatic polynomial of a graph. The latter is now known as the Whitney 
rank-generating function of the graph. Welsh [50, p. 44] draws attention to the 
rather confused history of these polynomials and their nomenclature. This history is 
clarified in [8, 9] where Tutte [49, p. 8] is quoted concerning the use of the name 
“Tutte polynomial’ as saying, 

This may be unfair to Hassler Whitney who knew and used analogous coefficients without 

bothering to affix them to two variables. 


Tutte cites Whitney’s 1932 paper [51] as his source. Formally, let G be a graph 
with edge set E. For a subset X of EF, let GLX] be the subgraph of G induced by 
X, and let r(X), the rank of X, be the difference between the number of vertices 
and the number of connected components of G[X]. The Whitney rank-generating 
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function R(G; x, y) of Gis 


R(G; x,y) = Yo x OPH ylx1-r C0, 
XCE 


The Tutte polynomial T(G; x, y) is the translation of R(G; x, y) defined by 
T(G;x,y)=R(G;x-1,y—1). 


In particular, when G is connected, T(G; 1, 1) is the complexity of G, that is, its 
number of spanning trees. When G has k(G) components, P(G; 4), the number 
of proper A-colourings of G is Ak@)(—1)") T(G; 1 — A, 0). Another important 
evaluation of the Tutte polynomial involves flows. 

To define a flow in a graph G, first assign directions to every edge of G. A 
nowhere-zero k-flow assigns a flow value f(e) from Zz — {0} to every edge e of 
G such that, at every vertex v, the sum of the flows on the edges directed into v 
equals the sum of the flows on the edges directed out from v. Intuitively, Kirchhoff’s 
Current Law holds at each vertex of G. For example, G has a nowhere-zero 2-flow 
if and only if every vertex has even degree. When G is connected, this is, of course, 
equivalent to G being Eulerian. 

It is straightforward to show that if G has a nowhere-zero k-flow, then G has no 
cut edges. Moreover, a plane graph G without cut edges has a nowhere-zero k-flow 
if and only if its dual G* is k-colourable. 

Let A be an additive abelian group. A nowhere-zero A-flow takes flow values 
from A — {0} such that, at every vertex, the flow into the vertex equals the flow 
out from that vertex. Thus a nowhere-zero k-flow is just a nowhere-zero Z,-flow. 
Remarkably, Tutte [29] showed that the number of nowhere A-flows on a graph 
depends only on the cardinality of A. 


Proposition 1 (Tutte, 1954) Forn > 2, let A be an abelian group with n elements 
and G be a graph without cut edges. Then the number of nowhere-zero A-flows on 
G equals the number of nowhere-zero n-flows on G. 


Tutte [29] made two striking conjectures about flows. 


Conjecture I (Tutte, 1954) There is a fixed number ¢ such that every graph without 
cut edges has a nowhere-zero t-flow. 


This conjecture was not settled for over 20 years until Jaeger [14] proved the 
following. 


Theorem 1 (Jaeger, 1976) Every graph without cut edges has a nowhere-zero 8- 


flow. 
Tutte’s second flow conjecture is even more elusive and still remains open. 


Conjecture 2 (Tutte, 1954) Every graph without cut edges has a nowhere-zero 5- 
flow. 


The Contributions of W.T. Tutte to Matroid Theory 351 


The best partial result towards this 5-Flow Conjecture was proved by Sey- 
mour [19]. Just as Jaeger’s proof relied on the fact that 8 is 2 cubed, Seymour’s 
proof relies on 6 being the product of 3 and 2. 


Theorem 2 (Seymour, 1981) Every graph without cut edges has a nowhere-zero 
6-flow. 


4 Matroids 


Before discussing Tutte’s contributions to matroid theory, we briefly introduce 
matroids to readers unfamiliar with them. 

Let A be a matrix having E as its set of column labels. Let 7 be the collection 
of subsets X of E such that X labels a linearly independent set of columns. The 
pair (E, 7) is an example of a matroid M with the members of the set -¥ being its 
independent sets. We denote this matroid by M[A]. In general, (E, .%) is a matroid 
M with ground set E if Y is a non-empty hereditary collection of subsets of the 
finite set E with the property that, whenever X and Y are in Y and |X| > |Y|, there 
is an element x of X — Y such that Y U {x} € 4%. Subsets of E that are not in .7 
are dependent and the minimal dependent sets are the circuits of M. Evidently, M is 
uniquely determined by its collection of circuits. If G is a graph, there is a matroid 
M(G) having E(G) as its ground set and the set of edge sets of cycles of G as its 
set of circuits. The matroid M(G) is the cycle matroid of G. 

For a field F, a matroid M is F-representable if there is a matrix A over F such 
that M = M[A]. A GF(2)-representable matroid is called binary. For example, 
over GF (2), let 


123 45 6 7 
1001101 
A= 0101 01 i1 
00101 1 i 41 
Then M[A] is a matroid with ground set {1,2,..., 7} whose circuits include 


{4, 5, 6} since, over GF (2), the three corresponding vectors are linearly dependent 
although any two of them are linearly independent. This matroid is usually called 
the Fano matroid and is denoted by F7. A geometric representation of this matroid 
is shown in Fig. 1. In such a picture, three collinear points form a circuit as do four 
coplanar points of which no three are collinear. The dual, F7, of the Fano matroid 
is the matroid M[A*] where 


ft 2344 56 9 
1101000 
2, | EO 2 OT D8 
~ {0 1100 1 0 
1110001 


352 G. Farr and J. Oxley 


Fig. 1 The Fano matroid, F7 1 


In general, if the n-element matroid M = M[I,|D], its dual M* is M[D? |In_-rl. 
More generally, suppose M is a matroid on the set E having F as its set of maximal 
independent sets (bases). The collection {E — B : B € &} can be shown to be the 
set of bases of a matroid on E; this matroid M* is the dual of M. 

The deletion of the element | from F7 is the matroid of the matrix that is obtained 
from A by deleting the first column. The reader may wish to check that this deletion 
is actually equal to M(K4) where {2, 3, ..., 7} is the edge set of K4. The contraction 
of 1 from F7 is the matroid of the matrix that is obtained from A by deleting the 
first row and the first column. This contraction is the cycle matroid of the doubled 
triangle graph, obtained from a triangle with edge set {2, 6, 3} by adding 4, 7, and 
5 in parallel with 2, 6, and 3, respectively. In general, for a matroid M, the deletion 
of the element e from M is the matroid M\e having ground set E — {e} and set 
of independent sets {J € 4% : e ¢ I}. Moreover, provided {e} is independent, the 
contraction M/e of e from M is the matroid with ground set E — {e} and set of 
independent sets {7’ C E — {e}: I’ U {e} € 4}. When {e} is dependent, we define 
M/e to be M\e. A minor of M is any matroid that can be obtained from M by 
a sequence of deletions and contractions. As partially outlined above, every minor 
of an F-representable matroid is F-representable. This means that the class of F- 
representable matroids can be characterized by the matroids that are themselves not 
F-representable but for which every minor is F-representable. These minor-minimal 
matroids that are not F-representable are the excluded minors for the class of F- 
representable matroids. 

Tutte wrote [53, p. 292] that his Ph.D. thesis 


attempted to reduce Graph Theory to Linear Algebra. It showed that many graph-theoretical 
results could be generalized to algebraic theorems about structures I called ‘chain-groups’. 
Essentially, I was discussing a theory of matrices in which elementary operations could be 
applied to rows but not columns. 


As Dan Younger noted in his wonderful memoir of Tutte [53, p. 292]: 


This is matroid theory. 


His chain-groups, called nets in his thesis, are essentially row spaces of rep- 
resentative matrices of representable matroids. In a sense, they may be regarded 
as represented matroids. But it would be pedantic to make much of the difference 
between these and representable matroids. 
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In essence, then, Tutte developed a theory of representable matroids as general- 
izations of graphs. Some of his work is valid for arbitrary matroids, in that some 
definitions and arguments only use matroid ideas (such as rank) in a way that does 
not depend on representability. But the thesis does not mention arbitrary matroids, 
and does not cite Whitney’s seminal 1935 paper on matroids [52]. 


5 The Excluded-Minor Theorems 


In a commentary on one of his matroid papers, Tutte wrote [46, p. 497], 


If a theorem about graphs can be stated in terms of edges and circuits only it probably 
exemplifies a more general theorem about matroids. 


The application of this principle is evident in much of Tutte’s work and has 
guided the efforts of a number of other researchers in matroid theory. Two of the 
most well-known graphs are Ks and K3,3, the latter being the three-houses-three- 
utilities graph. These graphs are forever linked by their appearance in Kuratowski’s 
famous characterizations [15] of planar graphs in terms of excluded (topological) 
minors. Tutte introduced the operation of contraction for matroids and also the 
notion of a minor of a matroid. In a very productive period in the late 1950s, Tutte 
published three important papers that included excluded-minor characterizations of 
various classes of matroids. This section will discuss these theorems. 

Looking back on his thesis, Tutte wrote [47, p. 6], 


I went on happily developing a theory of chain-groups and their elementary chains, these 
latter of course being defined by minimal supports. The method was to select theorems about 
graphs and try to generalize them to chain-groups. This was not too difficult for theorems 
expressible in terms of circuits. But theorems about |-factors imposed problems. As I look 
back on this episode I am grieved to recall that I still did not appreciate the work of Whitney 
[on matroids]. Yet these chain-groups were half-way to matroids and their minimal supports 
were Whitney’s matroid circuits. 


Later in the same paper, Tutte wrote [47, p. 7], 


By 1958 ...I had learned to appreciate matroids. I put the work in my thesis into matroid 
terminology and generalized from chain-groups to matroids. ... Then from the thesis- 
theorems I got the now well-known excluded minor conditions for a binary matroid to be 
regular and for a regular matroid to be graphic. 


The uniform matroid U2,4, which geometrically corresponds to four collinear 
points, is the matroid M[A] where A is the real matrix 


1 2 3 
101 1 
011 -1/ 
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It is straightforward to see that U2,4 is not binary. Tutte’s first excluded-minor 
theorem [31], which is relatively straightforward to prove, establishes that U2,4 is 
the unique excluded minor for the class of binary matroids. 


Theorem 3 (Tutte, 1958) A matroid is binary if and only if it has no U2,4-minor. 


A real matrix A is totally unimodular if the determinant of every square 
submatrix of A is in {0,1,—1}. A matroid M is regular if there is a totally 
unimodular matrix A such that M = M[A]. As an example, take a graph G and 
arbitrarily orient its edges. Then take the vertex-edge incidence matrix A of this 
directed graph. In this real matrix, each non-zero column has one | and one —1. By 
a result of Poincaré [17], A is totally unimodular. The matroid M[A] of this matrix 
can be shown to be equal to the cycle matroid M(G) of G. In general, a matroid 
is graphic it equals the cycle matroid of some graph. Thus every graphic matroid is 
regular. Tutte [31] proved the following. 


Lemma 3 A matroid M is regular if and only if M is F-representable for all 
fields F. 


Tutte’s second excluded-minor characterization [31] is significantly more diffi- 
cult than his first. 


Theorem 4 (Tutte, 1958) A matroid is regular if and only if it has none of U2,4, 
Fy, or F¥ as a minor. 


The last theorem was proved in two papers in the Transactions of the American 
Mathematical Society called A homotopy theorem for matroids I, IT. In a 1959 paper 
Matroids and graphs in the same journal, Tutte [33] characterized graphic matroids 
in terms of excluded minors. For a graph G, the dual of its cycle matroid M(G) is 
denoted by M*(G). Recognizing the link between cycles in a plane graph and bonds 
in the dual graph, the reader may not be surprised to learn that the circuits of M*(G) 
coincide with the bonds in G. One attractive feature of M*(G) is that it is defined 
whether or not G is planar. Thus, although non-planar graphs do not have graphic 
duals, the cycle matroids of such graphs do have matroid duals. 


Theorem 5 (Tutte, 1959) A regular matroid is graphic if and only if it has neither 
M* (K3,3) nor M* (Ks) as a minor. 


Tutte wrote [47, p. 8] of this theorem that it 


was guided, in the usual vague graph-to-matroid way, by Kuratowski’s Theorem and my 
favourite proof thereof. 


6 Higher Connectivity for Matroids 


Whitney [52] had introduced the notion of a non-separable matroid as one with 
the property that, for every two distinct elements, there is a circuit containing both. 
Such a matroid is now more commonly called connected. A loopless graph G has the 
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property that every two edges lie in a cycle if and only if G is 2-connected, provided 
G has at least three vertices and has no isolated vertices. Given the importance of 
higher connectivity for graphs, it was natural to seek a matroid analogue. Tutte [40] 
did this. One feature of Tutte’s definition was the desire for the connectivity of a 
matroid and its dual to be equal. Of course, a 3-connected graph cannot have a bond 
of size at most two. Dually, Tutte felt that a 3-connected graph should have no cycles 
of size at most two; in other words, it should be simple. Tutte began his work in this 
area by proving the following result for graphs [34]. 


Theorem 6 (Tutte, 1961) A 3-connected simple graph G has an edge e such that 
G\e or G/e is 3-connected and simple unless G is a wheel. 


Five years later, in the paper Connectivity in matroids, Tutte [40] generalized this 
theorem to matroids. Indeed, it is in his commentary [46, p. 487] on this paper that 
Tutte made the statement about generalizing graph results to matroids quoted at the 
beginning of Sect. 5. Let M be a matroid with ground set E. For a subset X of E, the 
rank r(X) of X is the cardinality of the largest independent set that is contained in 
X. Earlier, we defined the rank of a set of edges in a graph G. That rank is precisely 
the rank of X in the cycle matroid of G. 

Tutte defined the matroid M to be 2-connected if 


r(X)+r(E- X)-r(M) => 1 


for all X C E with |X|, |E — X| > 1. He then defined a 2-connected matroid M to 
be 3-connected if 


r(X) +r(E — X)—r(M)>2 


for all X C E with |X|, |E — X| > 2. 
The following result is elementary. 


Proposition 2 Let G be a graph with at least four vertices. Then 


(i) M(G) is 2-connected if and only if G is 2-connected and loopless; and 
(ii) M(G) is 3-connected if and only if G is 3-connected and simple. 


In the cycle matroid M (%,) of the r-spoked wheel Y,, the rim R is a cycle whose 
complement is a bond. The set R has the same size as the bases of M(%,), that is, as 
the spanning trees of Y,. Indeed, Tutte defined a new matroid W", the rank-r whirl, 
on the set of edges of Y, having as its bases all of the bases of M(Y,) together with 
the set R. 


Theorem 7 (Tutte, 1966) A 3-connected matroid M has an element e such that 
M\e or M/e is 3-connected unless M has rank at least three and is a whirl or the 
cycle matroid of a wheel. 


In 1980, Seymour [18] generalized this theorem by proving the following. 
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Theorem 8 (Seymour, 1980) Let M and N be 3-connected matroids such that N 
is a proper minor of M. Then M has an element e such that M\e or M/e is 3- 
connected having a minor isomorphic to N unless M is a wheel or a whirl. 


7 The First Conference on Matroids 


In 1964, Jack Edmonds was working at the National Bureau of Standards in 
Washington. He and his colleagues there organized the first conference on matroids. 
Tutte gave a series of Lectures on Matroids [36]. These appeared in the conference 
proceedings, which were published in the Journal of Research of the National 
Bureau of Standards in 1965. Tutte wrote [47, p. 8] about that 1964 meeting, 

To me that was the year of the Coming of the Matroids. Then and there the theory of 


matroids was proclaimed to the mathematical world. And outside the halls of lecture there 
arose the repeated cry: ‘What the hell is a matroid?’ 


The 1965 Journal of Research of the National Bureau of Standards included 
Tutte’s paper, Menger’s Theorem for matroids [37]. That important paper was 
largely ignored for about 35 years until, in 2002, Geelen et al. [10] recognized its 
utility. The theorem has been used extensively since then. 

For disjoint sets X and Y in a matroid M, define the connectivity between X and 
Y by 


KM(X, Y) = min{r(S) +r(E — S)—r(M):X CSC E-—Y}. 


Theorem 9 (Tutte, 1965) Let X and Y be disjoint sets in a matroid M. Then 
Ku (X, Y) is the maximum value of kn(X, Y) over all minors N of M with ground 
set X UY. 


Subsequently, Geelen et al. [11, Theorem 4.2] proved that this maximum could 
be restricted to minors N of M with E(N) = X UY such that N|X = M|X and 
N\|Y = M\Y. As an example, let {1, 2,3} and {4, 5, 6} be the disjoint triangles in 
a triangular prism graph P. Then, by contracting the three edges of P that are not 
in triangles, we get a doubled triangle with edge set {1, 2,..., 6}. A consequence 
of Theorem 9 is that, in an arbitrary 3-connected binary matroid M, if {1, 2, 3} 
and {4,5, 6} are disjoint 3-element circuits, then M has a minor on {1,2,..., 6} 
consisting of the cycle matroid of a doubled triangle. 


8 Tutte’s Ph.D. Thesis 


So far, we have mostly described Tutte’s published work on matroids. But many of 
his discoveries were made much earlier and were included in his remarkable Ph.D. 
thesis, completed in 1948 [28]. In this section. we discuss some particulars of the 
thesis. 
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In reading the thesis, it must be borne in mind that Tutte’s viewpoint for matroids 
is dual to the usual one, so that, for example, his “circuits” in nets generalize 
bonds (or minimal edge cuts) of graphs, and a matroid is “graphic” if its dual 
is graphic in the sense defined above. Similarly, the terminology for deletion and 
contraction aligns with standard usage for graphs but is swapped around for nets; 
see the discussion in [9]. There is also much nonstandard terminology, for example, 
“codendroids” for bases, “dendroids” for cobases, and “cyclic elements” for blocks 
in graphs and components in matroids. 

In Chapter III of the thesis, Tutte presents his extension of Menger’s Theorem to 
matroids although it is not until Chapter VII that he deduces Menger’s Theorem for 
graphs from his generalization. His paper ‘Menger’s theorem for matroids’ was not 
published until 1965 [37]. 

Chapter IV introduces regular matroids, under the name “simple nets”, approach- 
ing them from an unusual direction. Tutte then shows that a matroid of rank 7 on 
n elements is regular (according to his definition) if and only if it has anr x n 
representative matrix over Z such that the determinant of every r x r submatrix 
is in {0,1, —1}. It is routine to show that this condition is equivalent to total 
unimodularity of the matrix. Parts of this chapter were published and extended in 
[30]. 

Chapter V, the shortest in the thesis, is about his polynomials. It is the only 
chapter of the thesis that contains results he published before the thesis was 
completed in 1948. Its results are generalizations of a subset of those in ‘A ring 
in graph theory’ (published in 1947) [26]. Whereas [26] is restricted to graphs, this 
chapter of the thesis introduces polynomials for representable matroids. Instead of 
the V-functions of [26], we now have chromatic functions, which are called Tutte 
invariants or Tutte-Grothendieck invariants by later writers. Tutte’s definition of 
chromatic functions only needs deletion, contraction, and the notion of a matroid 
component. He then extends the Whitney rank generating function to matroids, 
which only needs a rank function. Thus these definitions make no real use of 
representability, and it is reasonable to regard them as the first extension to matroids 
of any polynomials in the Tutte-Whitney family. It would be another 20 years until 
Crapo [4] formally defined the Tutte polynomial for matroids. 

Tutte gives a recipe theorem for the (matroidal) Whitney rank generating function 
and defines, without name, the (matroidal) Tutte polynomial. An appropriate 
evaluation gives the number of bases, generalizing his observation for the number 
of spanning trees of a graph in [26]. Other evaluations give a representable-matroid 
analogue of counting q-colourings in a graph. Care is need with Tutte’s terminology 
in this chapter, as discussed in [9]. 

Chapter VI concerns connectivity in binary matroids, extending to them the 
notion of a 2-separation of a graph. For graphs, some of the theory appears in [39, 
Ch. 11]. 

Having worked entirely at the level of representable matroids for Chapters II- 
VI, Tutte establishes the relationship with graphs in Chapter VII. He develops the 
theory of cycle matroids and cocycle matroids and applies the theory of the previous 
chapters to them. Graphic matroids and their duals are shown to be regular. The 


358 G. Farr and J. Oxley 


Tutte polynomial evaluations of Chapter V are specialized to counting colourings 
and spanning trees. The theory of Chapter VI is applied to 2-connected graphs. 
Chapters VINI-IX, occupying 140 pages, give Tutte’s excluded-minor characteri- 
zation of (the duals of) graphic matroids among binary matroids. The four excluded 
minors are called “gnarls” and he calls his result the “gnarl theorem’”’. It has been 
the foundation and inspiration of matroid structure theory ever since, and is a fitting 
climax for one of the greatest doctoral theses of twentieth century mathematics. 


9 The Move Away from Matroids 


Although Tutte did publish some matroid papers after 1966, these later papers were 
in conference proceedings [43, 44] reiterating results from earlier journal papers or 
were supplements to earlier papers [41, 42]. Tutte’s 1966 paper On the algebraic 
theory of graph colorings [38] proposed a conjecture for binary matroids now 
called Tutte’s Tangential 2-Block Conjecture, which can be viewed as an analogue 
of Hadwiger’s Conjecture. The same paper included [38, p. 22] the following 
conjecture on 4-flows, now known as ‘Tutte’s 4-Flow Conjecture’. This conjecture 
remains open in general. 


Conjecture 3 A graph without cut edges or nowhere-zero 4-flows has a Petersen- 
graph minor. 


For cubic graphs, the last conjecture is equivalent to the assertion that every cubic 
graph without a cut edge or a Petersen-graph minor is 3-edge-colourable. A proof of 
this has been announced by Robertson, Sanders, Seymour, and Thomas. It appears 
in a series of papers including [6], which provides details of the other papers. 

In 1981, Seymour [20] reduced Tutte’s Tangential 2-Block Conjecture to the 4- 
Flow Conjecture by using his decomposition theorem for regular matroids [18]. 

By 1967, Tutte had essentially stopped publishing new results in matroid theory. 
Why? Looking back on his homotopy theorem for matroids, the excluded-minor 
characterization of regular matroids noted above (Theorem 4), Tutte wrote [47, p. 8], 


One aspect of this work rather upset me. I had valued matroids as generalizations of graphs. 
All graph theory, I had supposed would be derivable from matroid theory and so there 
would be no need to do independent graph theory any more. Yet what was this homotopy 
theorem, with its plucking of bits of circuit across elementary configurations, but a result in 
pure graph theory? Was I reducing matroid theory to graph theory in an attempt to do the 
opposite? Perhaps it was this jolt that diverted me from matroids back to graphs. 


10 Tutte’s Contributions 


Tutte’s contributions to mathematics were immense. MathSciNet credits him with 
160 publications. As of May 17, 2018, MathSciNet also lists 3656 citations for his 
papers although it should be noted that this source primarily constructs its list for 
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the years 2000 onwards. From 1967, he was the Editor-in-Chief of the Journal of 
Combinatorial Theory. 


Under his leadership the journal flourished. It became such a desirable place to publish that 
in time it was partitioned into two, series A and B, with Tutte retaining the leadership of 
the latter until his retirement as professor from the University of Waterloo in 1985 [53, 
pp. 294-95] 


To this day, that journal remains preeminent in combinatorics. Tutte was a 
founding member of the Department of Combinatorics and Optimization at the 
University of Waterloo, and he had eight Ph.D. students most notably Ron Mullin 
and Neil Robertson. 

In 2012, British Prime Minister David Cameron wrote a letter to Tutte’s niece 
Jeanne Youlden [53, p. 286] expressing the gratitude of the United Kingdom for 
Tutte’s codebreaking work. Cameron wrote [53, p. 286], 


We should never forget how lucky we were to have men like Professor Tutte in our darkest 
hour and the extent to which their work not only helped protect Britain itself but also shorten 
the war by an estimated two years, saving countless lives. 


One aspect of Tutte’s creative work has yet to be touched on here. The Four 
invented a mathematical poetess named ‘Blanche Descartes’. Any one of them could 
add works under her name but Tutte was believed to be the primary contributor. 


The Four carefully refused to admit Blanche was their creation. Visiting Tutte’s office in 
1968, [Tutte’s fifth Ph.D. student Arthur] Hobbs had the following conversation with him: 
Hobbs: “Sir, I notice you have two copies of that proceedings. I wonder if I could buy your 
extra copy?” 

Tutte: “Oh, no, I couldn’t sell that. It belongs to Blanche Descartes.” [13, p. 4] 


At the conference banquet celebrating Tutte’s eightieth birthday, he recited the 
following poem written by Ms Descartes especially for the occasion [5]. The second 
author, on requesting a copy of the poem from Professor Tutte, was handed the 
original handwritten version. 


The Three Houses Problem 


In central Spain in mainly rain 
Three houses stood upon the plain. 


The houses of our mystery 

To which from realms of industry 
Came pipes and wires to light and heat 
And other pipes with water sweet. 


The owners said, “Where these things cross 
Burn, leak or short, we’ll suffer loss 

So let a graphman living near 

Plan each from each to keep them clear.” 
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Tell them, graphman, come in vain, 
They’ ll bear the cross that must remain 
Explain the planeness of the plain. 
Blanche Descartes 
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Cluster Decorated Geometric Crystals, M®) 
Generalized Geometric peels 
RSK-Correspondences, and 

Donaldson-Thomas Transformations 


Gleb Koshevoy 


Abstract For a simply connected, connected, semisimple complex algebraic group 
G, we define two geometric crystals on the .2/-cluster variety of double Bruhat 
cell BL 1 BwoB. These crystals are related by the * duality. We define the graded 
Donaldson-Thomas correspondence as the crystal bijection between these crystals. 
We show that this correspondence is equal to the composition of the cluster chamber 
Ansatz, the inverse generalized geometric RSK-correspondence, and transposed 
twist map due to Berenstein and Zelevinsky. 


1 Introduction 


For reductive split algebraic groups, Berenstein and Kazhdan [1] defined decorated 
geometric crystals. One of important feature of such a crystal is a decoration 
junction. For double Bruhat cells, in relation to mirror symmetry, this decoration 
function have been appeared in [10] as a pullback of a Landau-Ginsburg potential 
defined in the cluster setup in [12] with respect to a proper map of .& -cluster variety 
to 2 -cluster variety on the double Bruhat cells, for the Langlands dual groups. 

We follow the recipes of [7, 9, 15], and endow the .%-cluster variety of double 
Bruhat cell GY’ := B_M BwoB with two geometric crystals for Langlands dual 
group GY, related by the * duality. The Kashiwara crystal admits a duality operation 
*. One may regard the above * duality as a geometric lift of the Kashiwara * duality. 
The decoration function for the « dual geometric crystal can be regarded as the 
pullback of the Landau-Ginzburg potential for the cluster algebra which is obtained 
by reversing all directions of edges in quivers of that considered in [10]. In this paper 
we will consider the case of simply-laced groups. 

There are two actions of Cartan torus H on G“° from the left and from the 
right. Under the action H from the left, we regard GY’ as H x B“°, where B“® := 
B_1 NwoWN 1s the reduced Bruhat cell. Berenstein and Kazhdan endowed such a 
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reduced cell with decorated geometric crystal structure. Under the action H from 
the right, we regard GY°-° as N“° x H, where N“° := N_1 BwoB is also a 
reduced cell. We endow N“° with * dual decorated geometric crystal structure. For 
the former crystal we let the frozen variables Awow;,w;,1 € I, be fixed (J denotes 
the set of vertices of the Dynkin diagram), while for the « dual crystal we let be 
fixed another half of frozen variables Aj, ,, i € I. 

In order to obtain combinatorial crystal from geometric one, we have to consider 
toric charts of a positive structure and corresponding tropicalization [1]. 

There are several positive structures on G“’°, 

For one of such structures we use toric charts which constitute the Berenstein- 
Zelevinsky positive structures of B“°, BZ-variety, see [1] and [18]. The graded 
cones corresponding to the charts of such a positive structure are defined by trop- 
icalization of the Berenstein-Kazhdan decoration function, and the cones turn out 
to be polyhedral realizations of the (graded) Kashiwara crystal due to Nakashima- 
Zelevinsky [19]. Specifically, such charts and cones correspond to the same reduced 
decomposition i € R(wo) of the longest element wo of the Weyl group. For 
i € R(wo), we denote such a cone gr_V Fj. 

Another positive structure is related to the Lusztig variety on N“° [4, 15]. The 
charts of this variety are also defined for reduced decompositions of R(wo). For 
i € R(wo), the tropicalization with respect to the « dual potential gives polyhedral 
realization of the combinatorial crystal with vertices being lattice vertices of the 
Kashiwara * dual Lusztig graded cone, gr-Z;*. 

One more positive structure is related to the -cluster variety. Specifically, we 
consider only a part of cluster toric charts of -cluster variety which correspond 
to the reduced decompositions of R(wo). We consider two families of positive 
charts, for one we let to be fixed frozen Ayow;,0;, 1 € I, we call fixing frozen 
specialization, and for another we make specialization at the frozen Ay, 4,,1 € I. 
For i € R(wo), tropicalization with respect to the corresponding chart of the former 
one and Landau-Ginzburg potential provides us with the polyhedral realization of 
the Kashiwara crystal being unimodular isomorphic to the graded Lusztig cone 
gr and tropicalization with respect to the latter one and « dual LG potential gives 
us the polyhedral realization of the Kashiwara « dual being unimodular isomorphic 
to the graded Littelmann cone gr.A%, see [10]. 

We provide birational positive mappings between these positive structures. For 
that we use birational automorphisms tori (C*)/(”0) called the generalized geometric 
RSK-correspondence, gRSK, and its inversion (Sect. 5), two mappings from cluster 
tori localized at frozen coordinates called Chamber Ansatz [10] and transposed twist 
map of [3]. 

We define the graded Donaldson-Thomas transformation as the map which, for 
each reduced decomposition i € R(wo), makes the following diagram with the 
positive structures on G°-° commutative and tropical graded DT-transformation as 
that wrt the tropicalization. 
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T 
: E Nwo,.e 5 
Lusztig-variety x H aes HxBZ-variety 


ta t B (1) 


cluster charts specialized at 27, | cluster charts specialized at 
{Ao;,0; tier {Awoa;.o; tier 


where tae: , is transposition of the twist map defined in [3, Definition 4.1], a is the 
composition of CA™~ and inverse gRSK, f is the composition of CAT and gRSK, 
where CAt denotes the tuples maps grC Aj, i € R(wo), and CA™~ denotes the 
inverse of gr N Aj, defined in [10, Definition 6.1 and 7.1], for details see Sects. 5 
and 6. The latter mappings are motivated by the Chamber Ansatz [4] for the Lusztig- 
and Berenstein-Zelevinsky-parametrizations of N° and B“°, respectively. 

The twist ne is acrystal bijection sending N_N BwoBx H to Hx BLONNwoN. 
Since all vertical maps are crystal isomorphism we get that the graded Donaldson- 
Thomas transformation is an isomorphism of geometric cluster crystals. 

In other words, the graded Donaldson-Thomas transformation is the composition 
of five maps, the inverse generalized geometric RSK and CA™, sending cluster 
variety specialized at the half of frozen A, .4;,i € 7, to the graded Lusztig variety, 
both endowed with «-dual geometric crystal structure, then transposed twist map 
which sends the Lusztig variety to the Berenstein-Zelevinsky variety, where the 
latter is endowed with the geometric crystal as in [1, 18], and finally the inverse 
generalized geometric RSK and inverse CAT sending BZ-variety to the cluster 
variety specialized at another half of frozen Ay; »;,i € I. 

The tropical graded DT-transformation is as in [10]. Specifically, tropicalization 
of the above diagram leads to the definition of a tropical DT-transformation which 
makes the following diagram commutative 


tropical BZ-twist 
sone BE TR 


+ tropical inverse RSK + tropical RSK (2) 
tropical ZT 


The mappings between the SW-NE and NW-SE corners of diagram (1) are geo- 
metric lifting of the Kashiwara « dual crystal isomorphisms between corresponding 
corners of the diagram (2). 

Thus we may regard « duality (Kashiwara « duality) as the composition of 
the transposed twist, the inverse generalized geometric RSK correspondence, and 
inverse CA* (tropical twist, tropical gRSK, and tropical inverse CA‘). 
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Goncharov and Shen [11] conjectured that the Donaldson-Thomas transforma- 
tion, defined for .2°-cluster variety, is the twist 7 ,),e under specialization at all 
frozen variables. This conjecture is proven in [20]. 

Our graded Donaldson-Thomas transformation is the composition of transposed 
twist and maps a and B!. 


2 Preliminary and Notations 


2.1 Simply-Connected Algebraic Groups 


For a simply connected, connected, semisimple complex algebraic group G, let B 
and B_ be Borel subgroup and its opposite, N and N~ be unipotent radicals, a 
maximal torus H = BM B_, and W = NormG(H)/H be the Weyl group. Let 
A = (@ij)i,jex be the Cartan matrix, cardinality of 7, |/|, equals the rank of G. 
The Weyl group W is canonically identified with the Coxeter group generated by 


the involutions s;,..., sj7;, subject to the relations (S583) = e,djj = 0,3,4,6 
if ajj = 0,1,2,3, respectively. A reduced decomposition of w € W is a word 
i = (i, ---i) in the alphabet J, such that w = sj, ---s;, gives a factorization of 


smallest length. The length / is called the length of w and denoted by /(w). For 
w € W, the set of all reduced decompositions is denoted by R(w). We denote 
by wo the element of maximal length in W. Any two reduced decompositions i, 
i € R(w) are related by the Artin relations. For simply-laced cases, Artin relations 
are 2-moves ad 3-moves. Specifically, a reduced word j = (j1, ..., jz) is defined to 
be obtained from i = (i1,..., i7) by a 2-move at position k € [1 — 1] if ig = je for 
all € ¢ {kk +1}, (inti, ik) = Gite jett) and aig, ig4, = 0. 

A reduced word j is defined to be obtained from i by a 3-move at position k € 
[2 — 1] ifie = je forall 2 ¢ {kK -—1,k,k +1}, jet = jeqa = tk, fk = ik-1 = inti 
and Gig ,igg) = —1. 

Let g be the Lie algebra of G, and h the Cartan subalgebra. Let {a ,..., @7)} C 
h* be simple roots for which the corresponding root subgroups are contained in N. 
For i € I, let ¢; be the homomorphism SL2 — G corresponding to the ith simple 
root of G. Giveni € I define 


xi) = 0 (; ) yilt) = 6 (! ') T= 4) & ae 
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19 
X-i(p) = ¢i ( ) 
1 p 


aio =0i(§ a ai(h) = ,ied, 


=4 
Oc i+] 


and 


2.2. Cluster Seeds Associated to Reduced Decompositions 


Recall that, for a dominant weight 7: H > Gy», the principal minor Aj, : G > A! 
is the function defined on the open subset N” HN C G by 


A,(u hut) :=A(h) u- EN he AH,ut en. 


Let y, 6 be extremal weights such that y = w1A, 6 = w2A for some wy), w2 € W, 
i € P*. The generalized minor associated to y and 6 is 


Ay,3(g) = A,(W,'gW2), g €G, 


where w is a lift of W into NormGH using 5;,i € I 
The base affine space G/N is the partial compactification of the open double 
Bruhat cell 


GM? = BwoBn B_ 


obtained by allowing the generalized minors Ag, », aNd Awyog,wq tO Vanish. 

Here we need a small part of cluster seeds of the <-variety. Namely, for a 
reduced decomposition i € R(wg), we consider the corresponding seed .7 (i) follow 
[5]. The vertices of the quiver Q(i) are labeled by the fundamental weights @,;,i € J, 
and i|<;,@;,, k € [(wo), ..., i]<~ denotes the subword i of the first x letters. 

The frozen vertices are labeled by the fundamental weights @;, i € J, and woa;, 
iel. 

The cluster variables of .“(i) are the generalized minors Ai) oi, Wi, attached to 
vertices labeled by i]<jaj,. 

Follow [5] we associate to every reduced word i a seed 3/(i). The set of edges 
quiver Ij is described as follows. For k € [—n] we set if = —k. For k € [/(wo)] we 
denote by kt = ke the smallest 2 such that k < ¢ and ig = ix. If no such @ exists, 
we set kt = /(wo) + 1. For k € [/(wo)], we further let k~ be the largest index @ 
with that 2 < k and ig = ig. 
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There is an edge connecting vg and ve with k < ¢ if at least one of the two 
vertices is mutable and one of the following conditions is satisfied: 
1.€=k", 
2.€<kt < l7, cee < Oandk, € € [N]. 


Edges of type (1) are called horizontal and are directed from k to €. Edges of type 
(2) are called inclined and are directed from £ to k. 

We need the following fact. Let j € R(wo) be obtained from i by a 3-move in 
position k. Then the transposition (k, kK+1) is an isomorphism of quivers Ij ~ ux}, 
where j1x is a mutation at the vertex labeled by i]<;@j,. 

The new variables is obtained by the ./-cluster mutation 


Ile (ke (j) Ac 4 Tie 2 (k,m)el (j) Am 
beAg = Ak Ak 
Ae else, 


if€=k, 


For reduced seeds corresponding to reduced words, such cluster mutation take 
the form of Plucker relations between generalized minors. 


3 Cluster Geometric Crystals 


On the ./-cluster variety G'’°-, we define two geometric crystals (for Langlands 
dual group) related by the Kashiwara *-involution. 


3.1 Geometric Crystal for &-Variety Specialized at A wow; ,o;’S 


We consider simply-laced case and define the main ingredients of the geometric 
crystal on the .-cluster variety G” obtained of G’°-* by the specialization at the 
frozen variables Awou;,o;,1 € I. 

For k € I, we denote by iz a reduced decomposition which starts with sz, we call 
such a reduced decomposition optimal from the head for k. 

For such an optimal reduced decomposition ix, we consider the corresponding 
seed .Y (ix). 

We define the crystal actions fx(c,---) : C*¥ x GWe > GM? k e€ TI, by 
specifying it on the variables of the seed .“ (ix). Namely, we set 


Ske, +) : Aeron = CA oy, cy (3) 


and fx(c, -) does not change other generalized minors labeling nodes of .7 (i(k)). 
In order to get the action of another crystal operation fj(c,-) on variables of 
this seed, firstly, we have to express the cluster variables of .“(i(k)) as Laurent 
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polynomials of cluster variables ./(i(/)), secondly we have to apply fi(c,-) to 
variables of these Laurent polynomials, and then to express such obtained Laurent 
polynomials in the variables of .“(i(k)). 

Because of refinement of the Laurent phenomenon for cluster algebras [8], that 
claims that frozen variables do not appear in denominators of Laurent polynomials 
expressing a cluster variable of one seed in the variables of another, we get that the 
crystal operations take the form of Laurent polynomials, indeed. 

Note that the frozen variables Ayou;.o;,i € 1, do not change under any of such 
crystal actions. This is a reason to specialize the cluster algebra at these frozen 
variables. 

We take the potential gx : G'® — C, as the decoration function due to 
Berenstein and Kazhdan [1] 


Aww; s;a; (M) Awos;a; w;(M) 
@gx(M)= ooo oo, MEG”, (4) 
» A woe; ,0; (M) x Awoa;,0; (M) 
ie] iel 
For a group G with simply-laced Lie algebra, it follows from [10], that, for each 

k and any reduced decomposition i(k) optimal for k, we have 


Dex (falc, Micky) — Pax Mi) 


Sie = 1) Noy ox (Mi) ro fe = 1) Aoy1,0%-1 (Mik) Moga .on41 (Mi) ; (5) 
Aspon,ox (Miv)) c Aspox,ox (Mik) Mex cox (Mi)) 
where Mj(x) a toric chart of the cluster variety G”-° written in cluster variables 
of .Y(i(k)). 

For SL, this means the following. We consider matrix elements of M € G0", 


S187 Oj Oj 


as Laurent polynomials which express , in variables of the cluster seed 


@j—-1,j-1 
S (i(k)). Micg) denotes such a representation of matrix elements. 

Because of that if we consider a point of the .e-variety, that is a collection of 
tuples, related by the cluster mutations, then each tuple of the collection defines the 
same matrix. 

Because of Positivity Theorem [14], these matrix elements are Laurent polyno- 
mials with non-negative coefficients. 

We define the functions g, ¢ and y being geometric lifting of the Kashiwara 
functions as follows. 

For the seed .7(i(k)), we set 


A sy con cor (Mix)) 
Aap ae (Mis)? 
A spc ox (Micky) Acx. ox (Mix)) 
Aox-,04-1 Mi) Aes 0x41 Mid)’ 


on (y (Miao) = =. (6) 


OK (Micgy) = 


ex(Mign) = and 
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Thus we have, 


Aey,y (Mica)? 


ak (y (Mix) = -—  _—. 
ta Dex 1,0K-1 (Mick) Aor 1,044.1 Micky) 


Because of the above formula and that all seeds before action of operations f;,’s 
has the same frozen variables, we get that y does not depend on a seed and 


Ae ax (MY 


ak(y(M)) = ———___—_—_—_—__., 
Ao 1,0%-1 (M) Aox41 ,Ok+1 (M) 


Mc Ghee, (7) 


Note that we can also regard functions ¢ and ¢ independently of cluster seeds, 
A sp cog cog (M) 
Aox,0y(M) ” 


A spay ,cox (M) Acy o, M) 
AK -1,0K-1 (M) Moxy 0441 (mM) 


Qx(M) = 
€x(M) = 


From the refined Laurent phenomenon, we get that, for any cluster seed, the 
functions ¢g and ¢ are Laurent polynomials in variables of that seed. 


3.2 SL3 


For example for SZ3 and a cluster seed, corresponding to a reduced word 121, 
let us denote the cluster variables t) = Ag), f2 = Asjo1,a,, 8B = Asosyo).015 
h2= Aw,2> 13 = Awoar,w2- 

Then elements of the corresponding cluster chart are matrices of the form 


ty 0 0 

P t12 
Mai=|t GF 0 
13 tt3+Bt2 1 


tl2 12 
and, since the 121 is optimal for s;, the action f\(c, -) is 


ct, 0 


ti2 

file Ma=|[2 a 
fy Chieatiat2 1 

1 


0 
0 
ct to th 


Then the potential ®g x computed in variables of this cluster chart is 


12] t2 12 3 | tb | 


tit, § ti2h3) «hts ot, ote3hn St 
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2 
The functions 91 (M121) = me €1(M121) = 92 and y, = a. 
In order to have the action f2(c,-) and to compute g2(Mj21) and ¢2(M)21), 
we have to represent Mj; in the cluster coordinates of the chart for the reduced 


decomposition 212. For 212, we get 


ty 0 0 
7 tt34+Bh2 t12 
M212 = fo a Ns 
a3 1 
83 ty «t12 


where 113 = As 7,o) and there due to the Plucker we have tj3f2 = tf23 + f3t12, 
where f1, f2, £3, f12,..., 23 are as above, and f, f3, £3, tj2,..., f23 are the cluster 
variables of .7 (212). 


Then the action f2(c, -) at the chart M212, corresponding to the seed .7(212), 
takes the form 


qt 0 0 
A tity3+et3ti2 12 
f2(c, Moi2) = | “= ee OY, 
Pi cee oe 
ty cty2 


and, hence, f2(c, -) acts in the chart for 121 as follows 


ty 0 @) 
= (c—Dtti2te t2 
Ale Moa=|a+ tose CTC 
t htg3+bi2 1 
3 to cti2 


tyto3tht tya(tyfo3+ht 
Hence we get g2(Mi21) = +8—3%, e2(Mi21) = neti ests) and y2(M) 
2 


ear 


t 
Note the potential 6g % computed in variables of cluster chart for 212 is 


ty 113 b3 ty  otta3 th 
one a+ ie arr th 
1230 tila = 3t3 R333 


3.3. Cluster Charts and Geometric Crystals 


Denote by G’ (h) a leaf of G”’* obtained by fixing Ayow;,o, =: hi, i € 1. Namely, 
for a cluster seed, corresponding to a reduced word i € R(wo), we regard matrix Mj 
as product 


1 1 1 es 
My = oY (<—)ay (, ———_) ---a_(< ——) Mi, 


Awoe1 Awe n-1,017\-1 Awoo1,o1 


where M; € G" (1). 
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Theorem 1 For anyh ¢€ H, the crystal operations defined by the rule (3), the 
decoration function defined by (4), the functions ~, &, y defined by (6), define a 
geometric crystal on the © -cluster variety G' (h), in the sense of [1]. 


Proof We have to verify that the crystal operations satisfy the Verma relations and 
proper behavior of the above functions under the crystal actions. 


a) The claim that the crystal actions f,, k € I, satisfy the Verma relations (quantum 
Yang-Baxter equation!) 
fete t2)) = FE. fe? (FD) if k and k’ are joined by an edge in the Dynkin diagram 


and commute elsewhere, 


can be reduced to that claim for the case of SL3 (see [7]). The later case is 
straightforward computation. 

b) The relation of the decoration function ®gx and the Kashiwara functions g and 
€, takes the form 


c-—1 4 1/ce-1 
ge(M) ——&x(M) © 


Pex (fe(c, M)) — ®ax(M) = (8) 


For simply-laced groups, k € J and a seed -/(i(k)), i(k) € R(wo) and is optimal 
for k, (8) follows from [9, 10]. 


The claim that relations between ¢, g and y fulfill the requirements of [1] also 
follows from the above claim. 


4 x Dual Geometric Crystal 


We define the * dual geometric crystal on ./-variety G!, obtained from GY* by 
the specialization at the frozen variables Aj, «;,i € I. 

The Kashiwara crystal admits a duality operation * (see, for example, [13]), and 
one may regard such * dual geometric crystal as a geometrization the Kashiwara 
duality. 

Namely, fora € J, a reduced decomposition i“ is optimal from the tail for a, if ig 
is the last element of i*. One can consider i* as reversed i(a) with sy o(;) replacing s;. 

Let us consider the corresponding seed .“(i*). 


‘Integrable system related to this quantum Yang-Baxter equation is Toda lattice, and we will come 
to this issue in another paper. 
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For such a reduced decomposition i”, we define the action of ae ( ay -) on the 
variables of the seed .“(i*) by acting only on frozen variable 


Awowa,wa =< CAwowa,wa ; 


of this seed and does not changing other cluster variables of . (i*). 

To define Te ta) (c, +++) in another seed .Y, we have to mutate from -Y to .Y (i*), 
than apply fi (a)(c, -) on (i*), and than mutate back to ”. 

Remark that the frozen variables Ag, »,, i € 1, do not change under all such 
crystal actions. Because of that we make specialization at these frozen. 

To define all functions for a geometric crystal, we firstly define the decoration 
function 


iM) = ye yy ee ae Gh, 9 
KM) = en te Amer " 


iel iel 


Then, in the seed .“(i*) we get the following functions 


A wosa@a sq (Mic) 


. Mz) = ——————_., 10 
Pry (ay 6 V )) A wigteg tos (Miz) ( ») 
A aWa, a (XA a> 2 (Mic) 
er (a) (Mie) = W0SqMq ., W0Wa .&, I (11) 
Wo@a-1,a-1 (Mya) Awowa+t .@a+1 (Miz) 
Aiiviyea MO? 
Ayo (k) (Y*(M)) = mom kh MeX. (12) 


A woex—1,0K-1 (M) Augers: Ok+1 (M) 


Note that y* is the ‘highest weight’ for the Kashiwara geometric crystal with the 
potential Pex. 

For SL, andi € R(wo), we have the following relations, which shows symmetry 
of weights and highest weights on the language of geometric crystals, 


ony (Mi)o(y*(Mp) = TY ine" TT Bae xem) (3) 
peT(k) peT (k+l) 


where T(k) is a train track colored by k in the rhombus tiling for i, x (m, ¢) is the 
delta function of positive roots labeled by m-th cluster variable and the tile p, and 
t := CAt(.A(i)) (for details see [9]). Note that symmetry between y and y* breaks 
when we choose from what side the Cartan torus acts on B_. Another relations 
between weights and highest weights is 


an(y (Mi) uo ay(v*(Mi)) = [] az, I] 4%, (14) 


m 
mel (k) mel (k'):ay ,=—-1 
mm? 


where, for i, 1(k) = {j :ij =k},q:= CA” (7 (i). 
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Denote by G! (h) a leaf of GY-* with fixed Ay, a, =: hi, iel. 


Theorem 2 For each h, the above defined crystal actions f*, a € I, the 
decoration (9), and the functions (10)-(12) define a geometric crystal on the &- 
cluster variety G'(h). 


Proof For the Verma relations, it suffices to check for SZ3, and this is a rather 
straightforward. Then the relations among the actions, decorations and the functions 
in order to fulfill the axioms of the geometric crystal, follows from the above 
property of the potential Wx in each seed tail optimal fora € I. 


4.1 SL; 


For SZ3, the cluster seed for 121 is tail optimal for 1 and hence for the second action 
i 


Thus, we have 


ty 0 0) 


* t2 

Sr(c,Mj21) = |] 2 it 

tt3+c3t2 1 
tt2 112 


The cluster seed for 212 is tail-optimal for 2 and hence, is optimal for the crystal 
action f;*, we have 


ty 0 O 
* 5 ct t23+3ti2 112 
fi'(c, M22) = | SS CO 
43 1 
13 ty cty2 


and, hence, ff (c, -) acts in the chart for 121 as follows 


ty 0 0 

* = (c=) ti ty302 ti 
fi (e, Miz) = | 2+ 1123 +6t2 ty o 
t hig3t+bh2 1 
3 tt ct2 


Note that in the cluster chart for 121, the specialization lead to the right action of 
H of the form 


4 0 0 53 0 0 
Q tg o}].-]02 0 
1 


1123 
] Gabstbt) 13 00 
ty tot23 t2 193 
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For cluster chart for 121, the decoration Y%.x« is of the form 


t4 t2 t5 t2 t5ty B 
Ui ee tte = 
tito tats t1f3 ty tate t 


Restricted to the leaf for t; =: hi, ty =: ho, we get 


A 


Ae 1K h 
Why, ho) = —— + 


ty t5 ty tsh 13 
= Dona Clay ome pe 
hy to tsh2 1213 hy hate t2 


For the cluster seed, corresponding to a reduced word 212, we have the cluster 
4 re J, as , fe 
variables ft; = Aw, ,01. t = Asyor,029 4 = Asrsio1,019 13 = Aor,or t5 = Awowr,0r: 
For this seed, ea is of the form 


/ / / / fot -. 
p22 ty ty TL LL 
*K — U7a7 Tae age oe gy (eek Pa 
ae es ee 
4830 ly flys tg lg 


Note that we coincides with ®gx computed in the chart for 121 under 


‘reversing’ of the variables t = ae pk =l,...,5. 


As a generalization this remark we get the following 
Proposition 3. For a reduced decompositioni € R(wo) and the cluster chart S (i), 


we have 


Pie ( SD) = Wig (A U)”). (15) 


We establish an explicit crystal bijection between the geometric crystal and x 
dual geometric crystal below. 


5 Piece-Wise Linear Combinatorics 
and RSK-Correspondences 


5.1 Elementary Maps from Which We Make Geometric 
RSK-Correspondences 


For w € W and a reduced decomposition i € R(w), we define the geometric i- 
RSK as the composition of /(w) primitive maps, where /(w) is the length of i. (For 
simplicity we regard i as a word of J!) .) 
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For any i € I!) andk = 1,...,/(w) we define a primitive map as the rational 
map kk = i pe) 5 pe), ew) 5 (Chow) hy 


ty! ifk’>k 
00,k (t) ifk’ =k 
Kk (Oy = =O ae ” n . é > 
th OK E(t) Mik GER! <k, ip A iz 
thr . . A 
apa Uy toy) if k! < k, Ip = Ik 
where we abbreviated oy ;(t) := pz te. Clearly, each xx is a positive 


k! <L<k:ig=ig 
birational isomorphism of T!™). 
For a word i € R(w), the coordinates of T’”) are labeled (colored) by simple 
roots follow to i, 


th 2 13 +++ tk +++ Tw)-1 fw) 
Qj, Hig Wiz °° Ai, + °° Aijyy_1 Cj) 


Suppose, for example, that s;, = sj, = Siry) and Si; Hx Sina) for other j, than K7j(w)(t) 
is the following map 


ty t2 t3 
1 1 1 : 
Gis pay = = = tg —~4i3,i1~H) 22. 
nla + tk + hwy) ” Gettin tein) 3 + hwy) ™ 
tk tk] 34 ti(w)—1 ti(w) 
1 1 1 1 
; tk ~Fikg til) ~4iI(w)—14(w) 
Trane) “+11 (w) 1(w)—1 iw) 12+ the + linwy 


Definition 1 For a Cartan matrix A, a reduced decomposition i of w € W, the 
composition of maps 


K4 [= KL O+++ OKI(w) (16) 


1 
is a geometric i-RSK. 


(This definition is a slight generalization of that introduced in [6].) 
Geometric i-RSK is a positive birational isomorphism of T’“”) which depends 
oni. 


Example For SL3, and the word 121, we get 


tito 
+3 


Ki21(t1, t2, 8) = ( , 1203, t) + 13). 


In this example, we have K12; = K212, but this is because 212 = wo(1)wo(2) wo(1). 
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5.2 Inverse Geometric RSK 


The composition of the following maps provide us with the inverse map for RSK. 

Let w € W andi € R(w). Then the map (coy :(C*)" > (C*)",m := lw), 
sends the vector (p1,..., Pm) to the vector defined as follows. 

Denote by / (ix) = {j € [k]ij = ix}, and let fj) < j2 < ... < jira) = k be 
elements of this set. 

Then, for s € [j; — 1], 


is sig 


= a 
Ky '(ps) =PsPr > 


fors = ji, 
2 
| Ps Px 
kK, (Pps) = ———, 
a Pps +1 
and we redefine px := px(1) as px (2) := eae 


fors € [ji — 1) \ [y-11, 
cc'(pe) = pal pe), 


fors = jy, 1 < |I(ix)I, 


e'(p,) = —PePeO* 
pe paper. 
and we define 
0) 
poi 


Pe pj, +1 


and continue as above for the next interval [ jj41] \ Liz]; 
then, for s = j\7(,);, we set 


ke (Pe) = PRU), 


and for s > k, Ky (Ps) = Ds. 
We define K, ! : (C*)” > (C*)” by the rule 


Kj (pis «+s Pm) t= Ky 008 Ky OK, (pis <+25 Pm) 


Note that ky) (pi, .++5 Dm) = (P1,---, Pm) is the identical map. 
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For example, for SZ3 and a reduced word s 5251, we get 
— P\ 
Ky '(P1, P2, P3) = a P2, P3), 


9193 q3 = qQ3 


—1 
K (1; q2; q3) = (———_, q2 ) ’ ———), 
qigtl*-qiq3tl ? qig3 +1 


The composition of these maps is 


P1P3 P\P3 + p2 P2P3 


Ky, : (pi. pz. p3) > (——, —— 
= P\p3 + p2 P3 Pip3 + p2 


5.3. Geometric Lusztig Mutations 


Piece-wise linear combinatorics of canonical bases was defined by Lusztig [3, 
15-17] as tropicalization the following birational mappings between tori (Gy)! 
coordinates of which are colored by corresponding transpositions of a reduced 
decompositioni € R(w), w € W, / is the length of w. 

Here we give the rule for simply-laced groups: Positive birational mappings 
between tori for different reduced decompositions i and i’ are either swapping 
coordinates for 2-move, if the decompositions are related by the corresponding 2- 
move, or 


qr Pq 
> pry, 2° 
ptr ptr 


(..-,D,9,1, +.) >.. .) 


for corresponding 3-move of the decompositions, and is the identical map on the 
torus T. 


5.4 Commutativity Elementary Maps x, and Lusztig Moves 


Proposition 4 For any i, the mapping k; and any geometric Lusztig move are 
commutative. 


Proof The statement is clear for 2-moves. It suffices to check the statement for a 
3-move and a mapping x; with i; € {is, is+1, is+2}, where the latter set of indexes 
corresponds to the triple of the 3-move, and s + 2 < 1. 

For i; = i;, we have 


a 
toy BO) = eg ) 


Cc 
“t(t+e) 
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where we denote by ¢ the ‘running value’ of f;, at the k — (s + 3) + I-step. 
Then the composition of the Lusztig 3-move and x; is 


bc(t +a+c) a+c abt 


(--, ate ‘eae oee 
and ‘running value’ atk — s+ 1 stepsist+a+c. 
On the other hand side we have, the Lusztig map sends 
( b ) ( be 4 ab ) 
»4,0,C,. aa > > Cc, reeds 
a+c . a+c 
and 
be ab bc(t+a+c) at+c abt 
Ki(..., ,atc, 6.8 ee 
a+c a+c atc t(tta+c) ate 


and ‘running value’ atk — s+ 1 stepsist+a+c. 
Checking of other possible cases we leave to the reader. 


Remark Let us note that in diagram (1), we can consider an expanded version 
by replacing the geometric RSK and its inverse by the elementary maps of 
which they are composed. On this way we will obtain new family of tori and 
corresponding tropicalizations of corresponding potentials. Explaining of meaning 
the corresponding potentials and crystal structures will be in done in another paper. 


5.5 Lusztig Variety and the Map CAt 


Consider the part of cluster variety, corresponding to seeds labeled by reduced 
decompositions, .Y (i), i € R(wo). 

For a reduced word i, the mutations of the tuples os (Aj) of the cluster variables 
of seeds .“(i) (specialized at the frozen Ayow;,o;,1 € I), 1 € R(wo), at vertices 
corresponding to 3-moves follow the Lusztig rule [10]. Recall that, for a reduced 
decomposition i € R(wg), the Chamber variables are defined as 


—a(ik,i, 
Thi <i <i Ai, - i 
oly I<lk Sip Pi; Pi] 


Ail .j- Wiz Wig Ai<i, Wi, Wi, 


ti) = » ke[l(wo)], (17) 


plus the frozen variables Ayou;.o;,1 € I. 
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We denoted that map CA* in the diagram (1), specifically, for a cluster 
seed .“ (i), this transformation sends cluster variables Aj to (te (i))x=1,....!(wo) =! 
CAT (A(i)) and leaves unchanged the half of the frozen variables Awe;,a;>4 € L. 


Proposition 5 The tuples {t,(i), k € [J(wo)]}, i € R(wo), form the Lusztig variety 
in the sense of Definition 2.2.1 [4]. 


Proof See, for example [10]. 


This Lusztig variety has the following implementation using elementary matri- 
ces. The following proposition a cluster version of the Chamber Ansatz of [4]. 


Proposition 6 For eachi € R(wo), the matrix 
xi(Kf ({te (i), k € [1(wo)})) (18) 


coincides with M, under change of cluster variables (17). 


5.6 Berenstein-Zelevinsky Variety 


In [3] it was considered the following positive birational maps for tori (C*)! (wo) 
labeled by reduced decompositions i and i’: swapping coordinates for i and i’ related 
by a 2-move and the birational positive transformations of the form 


q 
(2.529%. >G.., Pty PoP ae) 


q 
pe 
for that related by the corresponding 3-move. 

The BZ-variety is the collection of tori labeled by elements of R(wo) and glued 
together follows 3-moves by the above BZ-map. For an element { Ph k € [l(wo)]}} 
of the BZ-variety, the product 


Xi, (Pi) *** Xt, (Phin) 
does not depend on the choice of a reduced word i € R(wo), see [3]. 


Moreover, the BZ-variety endow G“°-* with a positive structure. This result 
essentially appears in [2]. 
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5.7 Graded Nakashima-Zelevinsky Cone 


Theorem 7 


1. For i € R(wo), the tropicalization of ®gx corresponding to the BZ-torus, 
labeled by i, defines gr. V &;j, the graded Nakashima-Zelevinsky cone for i; 

2. Tropicalization of the BK-potential ®gx defined by the torus obtained as the 
composition of geometric RSK KA and the map (17) is gr G, the graded Lusztig 
cone for i; 

3. The geometric RSK Kf sends the Lusztig (geometric) crystal actions fy, defined 
on the variables (17) to the Berenstein-Kazhdan geometric crystal actions [1] 
defined on the BZ-variety. 


From Theorem 7 follows that the composition of the CA* and the geometric 
RSK-correspondence provides birational maps between the cluster positive struc- 
ture and the BZ-positive structure for the same geometric crystal on G0". 

Before proving we give an example. 


Example For SL3 and a reduced word 121, we have 


= 0 0 
1f2 
t\t2 1 t 
X-1( )x-2(ht)xrititey=|_ g 0 
N+eB : ; 
1 H+ ht 
_— Ar a Ag -— _A23 : 
Here t) := ayaa ty = Fe aggt 8 §— Baan? and hence, we have the following 
form of the above matrix 
A, A23 0 0 
A2A3 A342 0 


A3 Aj Ao3 
1 Aig Ag Ant 1 
A2A1A3 A3A12 


Note that de-specialization is obtained by multiplication on the left by the 
diagonal matrix 


1 

ae - 0 
0 m4 0 
0 O A; 


The graded Nakashima-Zelevinsky cone for i is obtained of the realization of highest weight 
Kashiwara crystal with the highest weight }°,.; Cqa@q of the form the integer points of a polytope 
obtained as the intersection of the string cone .(i) and the polyhedron defined by inequalities: for 
eacha € I ry <q, while y runs the set of the crossings R (i) wrt the left boundary (see [9]) and 
ry denotes the corresponding Reineke vector. 
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that is 
az 0 0 A Ap3 0 0 A, 0 0 
“ Ao3 A2A3 A3Ai12 a An 
0 Az a Ad3 Ai Ag3 0 = | 42 Ai 0 
0 OA 1 AiA3+Ar3A1 1 Ax Avdstdssar 1 
3 2 A{A3 A3Ai3 3 Ady Az 


the latter is nothing but the cluster torus M2, for the reduced word 121. 
Thus, the Berenstein-Kazhdan potential computed in coordinates t;’s and h;’s 
this torus 


1 
hits 0 0 
Ago fot 0 
hy 63 hy 3 


1 1 
hy Fy ti + 83) gefats 


AS 1 ht ty +t 
Ogx(t,h) =H +h+h+4—-4+4 143 
hits ho tito 


Recall that the potential ®g x computed at the torus in coordinates p;’s and h;’s 


is 0 0 
ory (It ora(h)x—1(p1)x—-2(p2)x—1(p3) = | B+ 2) BAB 0 
1 
ho in P3 Fy P2 
is 
2 2 
p25 pi 1 a 
Ogx(p,h) = ps + p1+ — 4 4(— + 4 
Ps) ft pz ps” a pt 


Formal tropicalization of ®g x (t, h) defines the graded Lusztig cone for 121 and 
that of ®gx (p, h) defines the graded Nakashima-Zelevinsky cone for 121. 


5.8 Proof of Theorem 7 


We consider SL,. Items | and 2 are slight generalization of the Chamber Ansatz of 
[3]. 

Item 3: because of Proposition 4, we can make proof for the lexmin reduced 
decomposition imin := 1(21)(321)...(7 — 1n — 2---1). We have explicit form of 
geometric crystal actions. Let us check the statement for fo,,_,. Namely, we have to 
show that the composition Ki(fy,_;(c, CAT (Aj)) turns into multiplication by c the 
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coordinate (n(n — 1)/2 —n —2)th coordinate of Kj(CAT (Aj)). The latter coordinate 
corresponds to s,—; of the lexmin reduced decomposition. 

With help of (x — 1) 3-moves and (n — 3)(n — 3) /2 2-moves, we can get from the 
lexmin decomposition, the following one imin(m — 1) := 1(21)(21)...(2 -—3n - 
4---1)(n—In—2n—- 1n—3n—2n—4n—3.---12) of I(n—1). The corresponding 
sequence of moves the following point of the Lusztig variety corresponding to 
t™", a tuple of coordinates for the lexmin reduced decomposition. We denote 
t := t™” for simplicity. Denote by b, := ti(wo)—(n—1)—(n—2-s)) § = N— 2,...1, 
the coordinates of t which correspond to the segment (n — 2n — 3---1) of imin and 
by ds := ti(wo)—(n—1—s)s +++ 8 =n—1,...1, the coordinates of t which correspond 
to the segment (n — In — 2---1) of imin. Then we have 


th, k < 2@=) _ (an —3) 
1 p — GEDu-3) 
1 re Pn — ais Pn —3Pn—2 Trap Py Pn—2Pn—1 2 
4&n—1 — 4n—24@n-1 © 4n—34n—-24n—-1 41 n—3 4 n-24n-1 
: b 1 pa GEDO-3) 4 1 
timin—-)) = nee Pn—3?n—2 Pn —=2P n= 1 ~ 2 
Tm—3—WM1 AH —3M|—WMH=1 
bn— k= @=2)0-3) s,s<n—-2 
ee PrP st Pn 21 2 = 
4n—34%-24n-1 In —- 34-241 
1 -1 (n=2)(n=)) 1. , 
Pn—s+14n—s(bn—s + oO a k= H+ 5,55n-1 
4n-3 4-2-1 Te yan —34n—-2%—-1 


By definition, fy, _, changes only one coordinate 


1 
1 bn-2 bn-3bn-2 bi--bn—2bn-1 
Gn-1 + Gn-24n-1 Gn-34n-24n-1 ie se A} +**dn—34n—24n—-1 
of timin(2—1) to 
Cc 
1 bn-2 bn—3bn-2 + by--bn—-2bn-1 : 
an-1 Gn—24n-1 Gn—34n—24n-1 ae \-*-An—34n—24n-1 
We have 
eet) Vine@=1) n(n — 1) ; 
ie +t = by tas, k= caer aa (Qn —3)+5,5<n—1, by_1 :=0. 


Because of this property and since each elementary x; has the same conservation 
law, we get the statement. 
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6 Inverse Geometric RSK and « Dual Geometric Crystals 


One can see that elementary maps of which the reverse geometric RSK is composed 
are also commute with transformations of the Lusztig variety. 


6.1 Lusztig Variety and the Map CA~ 


For each seed .“ (i), with i € R(wo), we make the following change of cluster vari- 
ables wrt the specialization of the frozen variables A,,, »,, denoted by CA~ (A(i)), 
that is inversing the map gr N Aj of [10], and defined by 


Aj <[Wj, Wi 
See pein, (19) 


{i(wo)—I+1 - i te 
Then for a reduced word i(k) which is tail optimal for k, we get that the *-dual 
Jk * (c, -) action changes only one variable q1, sending it toc - q1. 

Note, that in coordinates q)’s, the cluster transformations between seeds labeled 
by reduced decompositions are nothing else but the inverse BZ-moves for 3-braid 
moves between the corresponding words. Inverse means the following 


q 
Ries 
Por 


q 
ee ne ae ae? aes ). 


For a reduced word i € R(wo), we have the corresponding variant of the 
Chamber Ansatz 


Proposition 8 The factorization 
yi(K; '()) (20) 


defines the cluster torus factorization Mj, written in coordinates (19). 
Here is an example. 


Example Consider SL3 and reduced word 121. Then the reverse RSK sends 


N95 9193 + 42 9243 


(41, 92,43) > ( ; ; F 
q1g3 + G2 93 4193 + 42 


and 


1 0 


0 
2 
1193 gg eS) ¢§ Bes - 1 0 


yi (—————_ q3 
193 + 2 q3 q1q3 + 2 nigra 4 
B 


q193 
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Recalling that g3 := 2, q2 i= 3, qNi= aS, we get the latter matrix as 
1 0 0 
Ao 
rel 1 0 
A3 A3A12+A1A23 1 
Ai A2Aj2 
A; 0 O 
Multiplying the latter matrix on the right by diagonal matrix | 0 Zi we get 
0 O is 
A\ 0 0) 
A12 
A3A a A 
3412+ A A23 
A3 Aj Ag Ai2 


that is M121. 


6.2 Lusztig Variety and Decoration W,x 


For SLy,, we have 


Woex (yi(qa (ti) +++ dni (tnt) = > gi + >> —+— — Tne > 


iel(wo) kel yeRi 
(21) 


where Ri (i) denotes the set of crossings wrt the left boundary [9], and sy is the 
Reineke statistics. 

Thus the tropicalization of this potential defines the *-Kashiwara dual Lusztig 
cone. 

In such a case, we have 


Theorem 9 For anyi€ R(wo), 


Wax (yi(Ky | (q))o1 (21) «+ - @n—1 (An—1)) = 


aD 725" Yo qth? Limat Aim Leeman 


hk—-1hk+1 
kel yert(iy kel + ser(k) 


Note that the tropicalization of the latter potential defines the Littelmann graded 
cone gr.4. 
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We also have the following theorem 
Theorem 10 


1. For i € R(wo), tropicalization of the YK corresponding to the torus of the 
Lusztig variety labeled by i, defines *-Kashiwara dual graded Lusztig cone, 
gr Z;*. 

2. The inverse geometric RSK ke sends the dual Kashiwara (geometric) crystal 
action fy,, defined on the variables (19), to the geometric *-dual Lusztig crystal 
action on the Lusztig variety. 


6.3 Transposed BZ-Twist 


Berenstein and Zelevinsky [3, Definition 4.1]) defined twist map between reduced 
double Bruhat cells. We use this map for G“”°’*, and in such a case it is 


Nwy.e ? NO B_woB > BO N_woN-, uo e(x) = 1" tw! x14), 
where x — x‘ is the involutive antiautomorphism of G given by 
h'=h™= he A, x(t)’ = 40), yi)! = yild). 
By transposing the BZ-twist we get a map 
NN BwoB > BN N_woN_ 


which is a crystal isomorphism between the x-dual geometric crystals for the 
Lusztig-variety and geometric crystal for the BZ-variety. This result is a reformula- 
tion of Theorem 5.10 of [3]. 

Thus all the maps are in place in the diagrams (1) and (2) in order to define the 
Donaldson-Thomas transformation through commutativity. 

Note that all these maps are crystal isomorphism between corresponding cluster 
geometric crystals. 

Thus we have as a corollary 


Theorem 11 The Donaldson-Thomas transformation defined above is an isomor- 
phism of geometric crystals. 


In particular, the tropical DT-transformation is a crystal isomorphism between 
the Littelmann graded cone gr.4 and the Lusztig graded cone gr. 
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Part II 
Other Contributed Articles 


Fields of Definition of Finite ®) 
Hypergeometric Functions oes 


Frits Beukers 


Abstract Finite hypergeometric functions are functions of a finite field F, to C. 
They arise as Fourier expansions of certain twisted exponential sums and were 
introduced independently by John Greene and Nick Katz in the 1980s. They have 
many properties in common with their analytic counterparts, the hypergeometric 
functions. One restriction in the definition of finite hypergeometric functions is 
that the hypergeometric parameters must be rational numbers whose denominators 
divide q — 1. In this note we use the symmetry in the hypergeometric parameters 
and an extension of the exponential sums to circumvent this problem as much as 
possible. 


1 Introduction 


In the 1980s Greene [4] and Katz [5] independently introduced functions from finite 
fields to the complex numbers which can be interpreted as finite sum analogues of 
the classical one variable hypergeometric functions. These functions, also known 
as Clausen—Thomae functions, are determined by two multisets of d entries in Q 
each. We denote them by a = (a1,...,a@q) and B = (f1,..., Bg). Throughout we 
assume that these sets have empty intersection when considered modulo Z. The 
Clausen—Thomae functions satisfy a linear differential equation of order d with 
rational function coefficients. See [1]. 


Let F, be the finite field with g elements. Let ¢, be a primitive p-th root of unity 


and define the additive character wy (x) = a where Tr is the trace from Fg to 


F ». For any multiplicative character x : Ee — C% we define the Gauss sum 


g(x) = Y> x)vy). 


5 x 
xeFy 
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Let w be a generator of the character group on Be. We use the notation g(m) = 
g(w”) for any m € Z. Note that g(m) is periodic in m with period q — 1. Note that 
the dependence of g(m) on ¢p and w is not made explicit. Very often we shall need 
characters on FF of a given order. For that we use the notation q = g — | so thata 
character of order d can be given by w®/4 for example, provided that d divides q of 
course. 

Now we define finite hypergeometric sums. Let again a and B be multisets of 
d rational numbers each, and disjoint modulo Z. We need the following crucial 
assumption. 


Assumption 1.1 Suppose that 
(q — loi, (q — Bj €Z 


for alli and j. 


Definition 1.2 (Finite Hypergeometric Sum) Keep the above notation and 
Assumption 1.1. We define for any t € Fy, 


1 44,4 (em + a:q)g(—m — Bq) d 
Paes = Se ae dea a Try. 
1 oak ae YT (aims (—Biq) ) ed 


m=0i=1 


It is an exercise to show that the values of H, (a, B|f) are independent of the 
choice of fp. 
The hypergeometric sums above were considered without the normalizing factor 


d 
(| | s@iws(-Biw) 
i=l 


by Katz in [5, p. 258]. Greene, in [4], has a definition involving Jacobi sums which, 
after some elaboration, amounts to 


“ e(aiepg(—Biep 
= |Blq ,—d u mua H : : 
OL ep 


where |B| = 6; +---+ 6g. The normalization we adopt in this paper coincides with 
that of McCarthy, [6, Def 3.2]. 
Let 


d d 
A(x) = [[@ — ermiajy, B(x) = [[@ — 271). 
j=l j=l 
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An important special case is when A(x), B(x) € Z[x]. In that case we say that 
the hypergeometric sum is defined over Q. Another way of describing this case 
is that ke = a(mod Z) and kB = B (mod Z) for all integers k relatively prime 
to the common denominator of the a;, B;. In other words, multiplication by k of 
the w;(mod Z) simply permutes these elements. Similarly for the 6;. From work 
of Levelt [1, Thm 3.5] it follows that in such a case the monodromy group of 
the classical hypergeometric equation can be defined over Z. It also turns out that 
hypergeometric sums defined over Q occur in point counts in F, of certain algebraic 
varieties, see [2, Thm 1.5] and the references therein. It is an easy exercise to show 
that H, (a, B|t) is independent of the choice of w (it is already independent of the 
choice of Wr). 

One of the obstacles in the definition of finite hypergeometric sums over Q is 
Assumption 1.1 which has to be made on q, whereas one has the impression that 
such sums can be defined for any g relatively prime with the common denominator 
of the a;, B;. This is resolved in [2, Thm 1.3] by an extension of the definition 
of hypergeometric sum. The idea is to apply the theorem of Hasse—Davenport to 
the products of Gauss sums which occur in the coefficients of the hypergeometric 
sum. Another way of dealing with this problem is given by McCarthy, who uses 
the Gross—Koblitz theorem which expresses Gauss sums as values of the p-adic 
T’-function. 


Theorem 1.3 (Gross—Koblitz) Let w be the inverse of the Teichmiiller character. 
Let x?-! = —p and fp such that ¢, = 1+ m(mod m*). Let Ip be the p-adic 
Morita T-function. Let gq = p/ and 8q(m) denote the Gauss-sum over Fg with 
multiplicative character a". Then, for any integer m we have 


f-1 pim i 
soma TEx Tr, ({ 2), 


i=0 q 


Here {x} = x — |x] is the fractional part of x. In particular, when q = p we get 


=e Le )) 


See Henri Cohen’s book [3] for a proof. When p does not divide the common 
denominator of the a, 8; one easily writes down a p-adic version of our hypergeo- 
metric sum for the case g = p. 


Definition 1.4 We define Gp(a, B|t) by the sum 


Awe has ig oe pao ({e rf) Fe (A = #x}) 


Pp ({ai}) Tp{-Bi) 


Fae 
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Note that 


d 


A(m) = ae Jer + | + tous - | - Bi- = tt iJ. 


In particular A(m) € Z. Definition 1.4 almost coincides with McCarthy’s function 
aGa from [6, Def 1.1] in the sense that our function coincides with gGq(1/t). We 
prefer to adhere to the definition given above. The advantage of Definition 1.4 is 
that Assumption 1.1 is not required, it is well-defined for all parameters @;, 6; as 
long as they are p-adic integers. Define 


5 = 8(a,B) = max «Dba lai] + |-x — Bi] — |-BiJ. 


Then, using Definition 1.4 and the fact that —A(m) < 6 one easily deduces that 
p’G p(a, B|t) is a p-adic integer. In [6, Prop 3.1] we find this in a slightly different 
formulation. However, it is not clear from the definition whether this value is 
algebraic or not over Q. It is the purpose of the present note to be a bit more specific 
by proving the following theorem. 


Theorem 1.5 Let notations be as above and let K be the field extension of Q 
generated by the coefficients of A(x) and B(x). Suppose p splits in K, i.e. p factors 
into [K : Q] distinct prime ideals in the integers of K. Let A = max; (ka, kB) 
over all integers k relatively prime with the common denominator of the a; , Bj. Then 
pG, (a, B\t) is an algebraic integer in K. 


For the proof we construct in Sect.2 a generalization of the hypergeometric 
function H,(A, B|t) involving two semisimple finite algebras A and B over Fy. 
We show that it belongs to K and then, in Sect. 3 identify its p-adic evaluation with 


Gp(a@, Blt). 


2 Gauss Sums on Finite Algebras 


The main idea of the proof of Theorem 1.5 is to use Gauss sums on finite 
commutative algebras over F, with 1. Let A be such an algebra. For any x € A 
we define the trace Tr(x) and norm N(x) as the trace and norm of the F,,-linear 
map given by multiplication with x on A. 
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Choose an additive character y on A which is primitive. That is, to any ideal 
IC A,I F (O) there exists x € I such that y(x) # 1. Any other non-degenerate 
additive character is of the form w(ax) witha € A*. A multiplicative character x is 
called primitive if its kernel does not contain any subgroup of the form {1+ ala € I} 
for some non-zero ideal J in A. 

For any multiplicative character y on A* we can define a Gauss sum 


ga. x)= D> vO)x(). 


xeEAX 


When A is not semisimple, the Gauss sum can be 0, as illustrated by the following 
example. 


Example 2.1 Let A= Fp[x]/ (x2). Choose the additive character y(a + bx) = a 
It is easy to see that this is a primitive character. Note thata + bx € A% <= > 
a € F>. Let x be a nontrivial multiplicative character on F, and extend it to A” by 
x(a + bx) = x(a). Then 


ga. xv= >) thx(a)=0. 


ack, beF , 


© 


So we restrict ourselves to semisimple algebras. These are precisely the finite 
sums of finite field extensions of Fg. In this case there is an obvious choice for the 
additive character. 


Lemma 2.2 Suppose A is a direct sum of finite field extensions of Fg. Then w(x) = 
Tr(x) - santeiie Si 

fp © is a primitive additive character. 

Proof Let A = Oi _, Fi with F; a finite field extension of F, for all i. Then 


Wx) = au esas ee where Tr; stands for the trace function on F;. If yw were 


not primitive then there exists a € A,a # 0 such that y(ax) = | forall x € A. 


Suppose a = (a),...,q@;) and assume, without loss of generality, aj #~ 0. Then 
wW(x,0,...,0) = oe = | for all x € F;. By the properties of the trace of a 
field this is not possible. oO 


From now on we use the trace character on a semisimple algebra A as additive 
character and write g4(x) for the Gauss sum. So we dropped the dependence of the 
Gauss sum on the additive character. The only amount of freedom in the additive 
character rests on the choice of py. 

Proposition 2.3 Let A be a direct sum of finite fields over Fg and (x) = oe the 
additive character. Let x a multiplicative character. Then there exists a non-negative 
integer f such that 


IgaQOr = 4. 
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Proof Again, write A = Oi _1 Fi. Then x can be written as x(x1,...,X-) = 
x1(%1) +--+ Xr (xr), where x; is a multiplicative character on BS. This implies that 


saw. x) =] [gca. 


i=1 


where g(x;) is the usual Gauss sum on the field F;. The additive character on F; is 
S "© with the same choice of ¢p for each i. Our assertion follows directly. Oo 


Choose two finite semisimple algebras A, B over F,. Choose the trace characters 
on each of them with the same choice of ¢, and call them wa, wg. Let xa, xB be 
multiplicative characters on A*, B*. Denote the norms on A, B by Ng, Nz. 


Definition 2.4 We define 


| — 
Ay (A, B\t) = ——____ y Wa) Wa(—y) xa) xB), 


BA(XA)8B(XB) easgee dn DAO) 


for any t € Ke. 
The following theorem gives its Fourier expansion in ¢. 


Theorem 2.5 Let w be a generator of the multiplicative characters on eg When 
the context is clear we denote both functions w(Na4(x)) and w(Np(y)) by ay. We 
then have, 


3 8A(XAMN)SB(XBOV”) 


1 q 
H, (A, B|t) = —— Neg(—Dt)”. 
aC Gp aeOenGae OT 


=0 


Proof We compute the Fourier expansion ys Cma(t)” of Hg(A, Blt). The 
coefficient c,, can be computed using 


1 
bile ay > H,(A, B\to(t)”™. 


x 
teFy 


When we substitute the definition for H,(A, B|t) in the summation over t, we get a 
summation over t ae ,x € A*, y © B™ with the restriction tN4(x) = Ng(y). So 
we might as well substitute t = Ng(y)/Na,(x) and sum over x, y. We get, 


1 1 —1 m —m 
Cm = ——— > gp AOR) XA) xB) w(Na(x))"@(Np(y))™. 
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The summation over x yields g4(x4@jy). To sum over y we first replace y by —y 
and then perform the summation. We get w(Ng(—1))" gp (XB). This proves 
our theorem. oO 


Example 2.6 As in the previous section take two multisets of hypergeometric 
parameters a, 8. Suppose that (¢ — l)a;, (gq — 1); are in Z for all i, j. Take 
A=B= Fé, the direct sum of d copies of F, with componentwise addition 
and multiplication. The norm on A, B is given by N(xj,...,X¢@) = X1-°°-+ Xa. 
In particular Ng(—1) = (=1)7: For both A, B we take the additive character 
W(x1,...,Xa) = eee where Tr the trace function on F,. As multiplicative 
characters we take 


d d 
Ceara =[[eGQe™: FaGinneagia[laope 
i=l j=l 


An easy calculation shows that ga(xa@yy) = Tes g(m + (gq — 1)a;) and 
similarly for gg. So we see that we recover the finite hypergeometric sum of the 
previous section. ° 


Lemma 2.7 Suppose dimr, (A) = dimr, (B). Then H(A, B\t) does not depend 
on the choice of €p in the additive characters. 

As a corollary, in this equi-dimensional case the values of Hg(A, B\t) are 
contained in the field generated by the character values of x, XB. 


Proof When we choose ae ae ye instead of ¢, in the definition of the additive 
character it is easy to check that g4(x,) gets replaced by x4(a)~!ga(xa). And 
similarly for B. As a corollary any term in the sum in the hypergeometric sum 
in Theorem 2.4 is multiplied by w(Ng(a)/Na(a))”. Since a € Fy is a scalar, 
Na(a) = Np(a) = a’, where d = dimg,(A) = dimp,(B). Hence, in the case 
of equal dimensions of A, B the multiplication factor is 1. 

Let o € Gal(Q/Q) be such that it fixes the values of x4, xg but sends Sp to ae 
According to the above calculation Hj(A, B|t) is fixed under this substitution and 
hence under o. oO 


Let us return momentarily to Example 2.6 and suppose that the parameters a have 
the property that ka = a(mod Z), kB = B(mod Z) for all k relative prime with the 
common denominator of the a;, 8;. Then, for any 0 € Gal(Q/Q) there exists a 
permutation p of the summands of A = ot F, such that x4(o(x)) = xa(x)° for 
all x € A*. A similar permutation exists for B. Notice also that Tr(o(x)) = Tr(x) 
and N(p(x)) = N(x). 

A similar situation arises in the case A = Fr as F,-algebra. Let x4 be a 
character of order d dividing p’ — 1. Let p be the p-th power Frobenius on A, 
then x4(o(x)) = x,4(x)?, a conjugate of x4(x) for all x € A*. Notice also that 
Tr(p(x)) = x and N(p(x)) = N(x). 
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Definition 2.8 Let A be a finite dimensional F,-algebra. A ring automorphism p : 
A — Ais called an F,-automorphism if it is Fy-linear and it fixes both norm and 
trace of A. 


Proposition 2.9 Let A,B be finite commutative semisimple F,-algebras. Let 


XA, XB be multiplicative characters. Consider the subgroup G of Gal(Q/Q) of 
elements o for which there exists an Fg-automorphisms pa of A and pp of B with 


the property that x 4(pa(x)) = xa(x)° and xp (eB(x)) = xp(x)° for everyo € G. 
Then H,(A, Bt) lies in the fixed field of G for every t € Ee 


Proof Let o € G. We first compute the action of o on g,4(xa). Suppose that 
o(Sp) = ae 


ay = ye ea 


xeA* 


pa ee) 


xeA* 


-~1;. i 
> oe Oy a(a 1) 


xeA* 


xa(a)'ga(xa) 


A similar calculation holds for B. Now apply o to the terms in the sum in 
Definition 2.4. A similar calculation as above shows that the sum gets multiplied 
with x4 (a)~!xp (a)~!. This cancels the factor coming from g4(x4)gB (xB). Hence 
H(A, Bt) is fixed under all o € G. oO 


3 Proof of Theorem 1.5 


We use the notations from the introduction. In particular 
d d 
A(x) =[[@-2"), Ba) = [Jo — 2") 
j=l j=l 


and K is the field generated by the coefficients of A(x) and B(x). Let p be a prime 
which splits completely in K. Then we can consider A(x) as element of F,[x]. 
Let A(x) = Aj(x)---A;(x) be the irreducible factorization of A(x) in F,[x]. For 
the F,-algebra we take @/_,F p[x]/(Ai(x)). The construction of a multiplicative 
character on A is as follows. First we choose a multiplicative character w on F, 
such that its restriction to Fr has order p’ — | for all r > 1 and fix in the remainder 
of the proof. 
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Since p splits in K multiplication by p gives a permutation of the multiset « 
modulo Z. Under this action «(mod Z) decomposes into a union of orbits, which 
we call p-orbits. Let O be such a p-orbit. Then [[,-9(« — e77'®) is a polynomial 
and p splits in the field generated by its coefficients. So we can consider it modulo 
a prime ideal dividing p and hence as an element of F ,[x]. It is one of the factors 
Aj; (x) of the mod p factorization of A(x). The orbit O will now be denoted by O;. 
There are r orbits and we renumber the indices of the a; such that a; € O; for 
i=1,...,r.OnF,[x]/(Aj) we define the multiplicative character x; = w% @i-D, 
where qi = preetAi )_ If we would have chosen pa; instead of a;, the new character 
would simply consist of the Frobenius transform followed by x;. For the character 
xa on A = )*i_, F,[x]/(Ai) we choose 


; 
XA(X1,«. 6 Xr) = |] Jeaa%@-?. 
i=1 


Let o € Gal(Q/K). It acts as w(x) w(x)* for some integer k. Hence 


P 
XA, XP)? = []ouni@-, 


i=1 
This permutes the factors by a permutation s € S, and we get 


r 


holga 


i=1 


where 0 < /; < deg(A;) for each i. We used qs (;) = qi. We finally get 


r 
i; \oi(gi-1) li Ir 
XACXq, aod ag Xr)” = [Je (7g) = XA Cane aay tae) . 
i=] 


In other words, xa(x)° = xa(e(x)) for a suitable F,-automorphism p of A. 
Notice that norm and trace of A are preserved by p. A similar construction can 
be performed for B(x). According to Proposition 2.9 we get Hp(A, B|t) € K for 
allt € FP. 

In order to connect to the p-adic function Gp we take the inverse of the 
Teichmiiller character for m and compute the terms given in Definition 2.4 p- 
adically. The Gauss sum g(x) is the product of ordinary Gauss sums of the 
form g(@I- Dat ptt") over the field Fy with g = p!. The occurrence of 
m(1+ p+---+ p!~!) is due to (Np, je, (0)") = o(xym(++P"), The Gross 
Koblitz theorem for Gauss sums over F, with g = p! gives us 


pla 


d {zs| pia 
sorrel (e) 
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for every integer a. When applied to a = (¢—1)a+m(q—1)/(p—1) this amounts to 


fein (reer) 


Note that this is a product over the p-orbit containing w and each factor is precisely 
of the type that occur in the definition of the p-adic hypergeometric sum. A similar 
story goes for B(x). As a result we get 

+) 


BA(XAONIEB(XBON™) am ({ei + rf) Pe ([-# = 
BA (XA)8B (XB) E Ty dai) lp dBi) 


’ 


where A(m) is as defined in the introduction. So we find that p-adically 
H(A, B\t) = Gp, Blt). 


Hence we conclude that the values of Gp are in K. It remains to give an estimate 
for the denominator. The conjugates of H,)(A, B|t) are obtained by taking vat re 
as multiplicative characters. The corresponding hypergeometric parameters are 
ko, kB. From McCarthy’s work it follows that p4G p(ke, kB|t) is a p adic integer 
for all k relatively prime to the common denominator of @;, 6;. This implies that 
pH, (A, B|t) is an algebraic integer in K. 
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L-Series and Feynman Integrals M®) 


Check for 
updates 


David Broadhurst and David P. Roberts 


Abstract Integrals from Feynman diagrams with massive particles soon outgrow 
polylogarithms. We consider the simplest situation in which this occurs, namely 
for diagrams with two vertices in two space-time dimensions, with scalar particles 
of unit mass. These comprise vacuum diagrams, on-shell sunrise diagrams and 
diagrams obtained from the latter by cutting internal lines. In all these cases, 
the Feynman integral is a moment of n = a + b Bessel functions, of the form 
M(a,b,c) := ie Te) Ke (tede. The corresponding L-series are built from 
Kloosterman sums over finite fields. Prior to the Creswick conference, the first 
author obtained empirical relations between special values of L-series and Feynman 
integrals with up to n = 8 Bessel functions. At the conference, the second author 
indicated how to extend these. Working together we obtained empirical relations 
involving Feynman integrals with up to 24 Bessel functions, from sunrise diagrams 
with up to 22 loops. We have related results for moments that lie beyond quantum 
field theory. 


1 Physical and Mathematical Context 


The context for our work is given in [1]. At the conference, the first author reported 
on the magnificent progress made by Stefano Laporta, whose solitary decade- 
long effort on the magnetic moment of the electron has come to fruition [4]. This 
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involves, inter alia, moments of 6 Bessel functions, one of which 


2 
M(1,5, 1) =f In(t)K§ (tat = = ) s 


is empirically related in [1] to the central value of the L-series of modular weight 
4 and conductor 6, with Fourier series q [],.9(1 — ged — yoru _ gyru _ 
a = Vso Ao(k)q*. This modular form also delivers the Bessel moments 


a L6(3 
M(2,4, 1) =f I (t)Kg@tdt = oe (2) 
0 
me 3L6(2 
M (3, 3, 1) = Ig (t)K§ tdt = st y (3) 
0 
For n = 8 Bessel functions, the L-series of a modular form of weight 6 and 


conductor 6 likewise gives evaluations of M(1,7, 1), M(2,6, 1), M(3,5, 1) and 
M(4,4,1). The challenge presented at the conference was to find comparable 
relations between L-series and moments of n > 8 Bessel functions. 


2 Progress at Creswick 


This challenge was met by considering determinants of Feynman integrals, by 
allowing for adjustment of the local Kloosterman data, at primes that divide the 
conductor, and by empirical determination of the conductor, sign and gamma factors 
that enter the functional equation for the L-series. The good factors are classical and 
much useful information towards conductors was available in [5]. The formalism 
of [2] pointed us to the correct determinants. Using the methods in [3] for numerical 
computation of L-series, we were able to progress beyond the modular forms studied 
in [1]. 

We were successful for odd Bessel numbers up to n = 17 and even Bessel 
numbers up ton = 20. Let 92,5 be the determinant of the r x r matrix with 
M(a, b, 1) at top left, size r = [(a + b)/4 — 1], powers of t* increasing to the 
right and powers of i (t) increasing downwards. Then 


2) x 29 2215 


Ly) = ——>——= 4 

178) 35 x 52 x Trl? 4) 
2!2 x 11 x 131. Q1.,19 

£200) = STS BaD id 
2!9° x 17x 19 x 23.2 

Lo9(11) = oF he See (6) 


12 s¢°57 5¢ Foie 


are among findings made and presented at the conference. 
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3 Subsequent Progress 


From data on Kloosterman sums in finite fields F, with gq < 200,000, we found 


2'4 x 1093 x 13171 22,17 


L = 7 
198) 34 x 54x 7x 117? n 
29 x 12558877 21.23 
Lo(12) = —————__————— 8 
24(12) = STS x Ww 1130 (8) 
227 x 17 x 192 x 23? x 466812 
0) = Se (9) 


323 x 512 x 74 x 1127720 


In parallel with the above results for odd moments, we obtained relations between 
even moments of Bessel functions and L-series determined by a quadratic twist 
of Kloosterman data. We have conjecturally complete sets of quadratic relations 
for both types of moment, encoded by Betti and de Rham matrices, for which 
we provide explicit constructions. In some cases where the sign of the functional 
equation is odd, we are able to define regulated moments that deliver central 
derivatives. 
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Arithmetic Properties of Hypergeometric @ 
Mirror Maps and Dwork’s Congruences maa 


Eric Delaygue 


Abstract Mirror maps are power series which occur in Mirror Symmetry as the 
inverse for composition of g(z) = exp(f(z)/g(z)), called local qg-coordinates, 
where f and g are particular solutions of the Picard—Fuchs differential equations 
associated with certain one-parameter families of Calabi—Yau varieties. In several 
cases, it has been observed that such power series have integral Taylor coefficients 
at the origin. In the case of hypergeometric equations, we discuss p-adic tools and 
techniques that enable one to prove a criterion for the integrality of the coefficients 
of mirror maps. This is a joint work with T. Rivoal and J. Roques. This note 
is an extended abstract of the talk given by the author in January 2017 at the 
conference “Hypergeometric motives and Calabi—Yau differential equations” in 
Creswick, Australia. 


1 Arithmetic Conditions for Operators of Calabi-Yau Type 


An irreducible fourth order differential operator 7 in Q(z)[d/dz] is of Calabi-Yau 
type if it is of Fuchsian type, self-dual, has 0 as MUM-point and it satisfies certain 
arithmetic conditions including that 


(i) Y has a solution w;(z) € 1 + zC[[z]] at z = 0 which is N-integral!; 

(ii) Y has a linearly independent solution w2(z) = G(z) + log(z)@ (z) at z = 0 
with G(z) € zC[[z]] and exp(w2(z)/a@ (z)) is N-integral. 

An additional condition is usually considered: the instanton numbers ng associated 

with @ belong to nL for some non-zero integer N. As far as we know, a systematic 

approach to prove the integrality of the nyg’s has not yet been developed, even in 


TA power series f(z) € 1 + zQ|[z]] is N-integral if there is c € Q* such that f(cz) € Z[[z]]. 
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the case of hypergeometric equations. In this note, we discuss useful p-adic tools 
to prove or disprove Conditions (i) and (ii). A classical example of a differential 
operator satisfying both (i) and (ii) is 


L = 0* —5z(50 + 1)(50 + 2)(50 + 3)(50 + 4), 


where 6 = zt. Consider the two solutions 


(Sn)! 
a=) z" and @(z) = G(z) + log(z)@1 (2), 
n=0 . 
with 


n 


Ge= ye n)! (SHs5y —5Hy)z" and Hy i= ve 
k= 


n=1 


Then @(z) has integers coefficients and Lian and Yau proved in [10] that 


exp (22) € Zz. 
@\ (Zz) 


We shall see that hypergeometric techniques presented in this note allow to 
prove the integrality of the coefficients of g-coordinates associated with non- 
hypergeometric operators. For example, consider the differential operator 


L = 6) — 2(3407 + 5167 + 270 +5) 4+ 2°(0 + 1)’, 


whose holomorphic solution is the generating series of the Apéry numbers used by 
Apéry in its proof of the irrationality of ¢(3) (see [1]): 


(Zz) = 5) ) ‘ea! ae 


A second solution is given by the method of Frobenius and reads w2(z) = G(z) + 
log(z)@ 4 (z), with 


GQ) = Dat ) ‘ea (2Hn+k — 2Hn—x)2". 


n=1 k=0 


As we will see, a consequence of the results of the author [5] is that 


exp (22) € Z[l[zll. 
@ (Zz) 


First, we present criteria on the integrality of hypergeometric terms. 
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2 Integrality of Hypergeometric Terms 


2.1 Factorial Ratios 


Lete = (e1,...,e,) andf = (fi,..., fy) be vectors of positive integers. For every 
non-negative integer n, we set 


(ein)! +++ (un)! 


In) - we A 
reir! 


and we consider the generating series of 2: 
[o,@) 
F@)=)) 22", 
n=0 


which is a rescaling of a hypergeometric function. We consider the function A of 
Landau defined for every x in R by 


u 


A(x) := So Lex] — SOL Aix. 
j=l 


i=1 


Let p be a prime number. By Legendre’s formula, we have 
which yields 


Furthermore, we have A(x) = A({x})+({e|—|f]) Lx], where {x} is the fractional part 
of x and |e| = e; + --- + e,. Hence the graph of A is essentially determined by its 
values on [0, 1]. Landau’s function provides a useful criterion for the N-integrality 
of F(z). 


Theorem 2.1 (Landau [9], Bober [2]) The following assertions are equivalent. 


(i) F(z) is N-integral; 
(ii) F(z) € Zlz]l; 
(iii) For all x in (0, 1], we have A(x) => 0. 
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Landau proved the equivalence of (ii) and (iii) in 1900 while Bober proved in 
2009 a result which implies the equivalence with (i). One can easily compute the 
jumps of A on [0, 1] to check Assertion (iii). 

The generating series of factorial ratios are rescaling of hypergeometric functions 
whose parameters have a certain symmetry. Namely, if w@ = (a1,...,a@,-) and B = 
(Bi, ..., Bs) are tuples of parameters in QM (0, 1], then there is C € Q* such that, 
for every n € N, we have 

n (Q1)n- ++ (Arn (ein)! +--+ (en)! 


(Bin (Bsn (fin)! for)! 
if, and only if 


(X — e2!™@1) (X= e2iTer) 
(X — e2!MP1)...(X — e2imBs) 


is a ratio of cyclotomic polynomials. We will see that, when this is not the case, we 
still have a criterion for the N-integrality of hypergeometric functions but it involves 
several Landau’s functions: the functions of Christol. 


2.2 Generalized Hypergeometric Functions 


Leta = (a1,...,a,-) and B = (61, ..., Bs) be tuples of elements in Q\ Z<o. We set 


[ee 


(Q1)n-++ (rn 
Fi) =) 
@) dX (Bi)n oe (Bs)n : 


If 6; = 1 for some i, then F(z) is annihilated by the hypergeometric differential 
operator 


£=[]@+4-)-z]]@+a), 


i=1 i=1 


which is irreducible if, and only if a; # 6; mod Z. Elementary calculations show 
that F(z) is N-integral if and only if, for almost all primes p, we have F(z) € 
Zp)l[z]], where Z,p) is the set of the rational numbers whose denominator is not 
divisible by p. 

We introduce some definitions to construct useful functions defined by Christol 
in [3]. If x is a rational number, then we set 


= {x} ifx ¢Z, 


1 otherwise. 


(x) 
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We write < for the total order on R defined by 
xX y ==> ((x) < (y) or ((x) = (y) and x > y)). 


Let d be the common multiple of the exact denominators of the a;’s and 6;’s. For 
all a coprime to d, 1 < a < d, we set 


f(x) = Hl <i <r: aaqjxx}—-#H1l<j <s : apj x x}. 


Then we have the following criterion for the N-integrality of F(z). 
Theorem 2.2 (Christol [3]) The following assertions are equivalent. 


(i) F(z) is N-integral; 
(ii) For alla coprime to d, 1 < a < d, andall x inR, we have &q(x) = 0. 


If F(z) is N-integral, then the set of constants c € Q such that F(cz) € Z[[z]] 
is CZ for some C € Q*. When F(z) is algebraic over Q(z), then F(z) is N- 
integral and C is called the Eisenstein constant of F. Hence we shall also call C the 
Eisenstein constant of F(z). Rivoal, Roques and the author gave in [6] a formula 
for C when the parameters of the hypergeometric function belong to (0, 1]. 

For every prime p, we set 


Ap =#{1 Sir Qj eZ p}-Hlsjss : Bj € Zipy}. 
If @ is a rational number, then we write den(q) for its exact denominator. As a 


particular case of Theorem | in [6], we have the following formula. 


Theorem 2.3 If a and B are tuples of elements in (0, 1], r = s and F(z) is N- 
integral, then the Eisenstein constant of F is 


_ Here 5 ie 
TTj=1 den(By) 7 


In the case of factorial ratios, if e = (e1,..., e,) andf = (f),..., fy) are tuples 
of positive integers, then we have 


(ein)!---(eun)! — f eft eet \" TT Ta ein 
(fim) fort \ ft pf Tes se /fn 


If the associated generating series is (N-)integral then the Eisenstein constant is 
indeed 


e é 
e;| oo. “ey! 


ales ere ffi fe ff 
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2.2.1 Landau-Like Functions 


To prove Theorem 2.3, we use Landau-like functions to calculate the p-adic 
valuation of Pochhammer’s symbols. To define those functions, we first consider 
amap Dp introduced by Dwork as follows. 

Let p be a prime and @ in Zp). We write D »(a) for the unique element in Zp) 
satisfying pDp)(a) —a € {0,..., p— 1}. We have Dp) (1) = 1 andifa =r/N with 
r coprime to N > 2,1 <r < N, then 


sy (aN (p)~'an(r)) 


Dp(a) = N 


’ 


where sy is the section of the canonical morphism zy : Z > Z/NZ with values in 
{0,...,N — l}. 

If p does not divide d, then, for all positive integers 2, we define the Landau-like 
function 


r 


Apats) =D |x Dpia) - RP | | +- 256) - <P |r = 


i=1 j=l 


Christol proved in [3] a Legendre-like formula involving Landau-like function. 


Theorem 2.4 /f p does not divide d, then we have 


ee => (=) 
0 (Bin: ++ (Bsn t=l e p! 

Our first task to prove Theorem 2.3 was to find a convenient analog of Legenre’s 
formula when p is a divisor of d. In this case, Dwork’s maps are note defined for 


every parameters a; and £;. To that end, we proved in [6] an average formula for 
primes dividing d. 
Theorem 2.5 Assume that « and B are tuples of r elements in (O, 1] such that F(z) 
is N-integral. Let p be a prime divisor of d and write d = pf! D where D is coprime 
to p. 

For every a coprime to p, 1 < a < p/, andall positive integers €, we choose a 
prime Pa,e Satisfying Pa.e = p’ mod D and Pa. =a mod p!. Then 


, 
n @1)n +++ (Or)n 1 S y (= 7) Ap 
Cc oe = —— 

vn ( ea Got) d ZA Pa,tsI al me 


ged(a, p)=1 
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In the case of factorial ratios, we have again 


(e1n)!--- (€yn)! oan (Q1)n +++ (Qr)n 


(fin)! (for)! (Bn == (Bsn 


a and f are tuples of elements in (0, 1]. For every p not dividing d and every @, the 
map a induces a permutation on w and B. Hence we have 


r 


i Ss 1—B; 
Aneto) = Y0]x— Bhai) - HA | _ 1] x ah(8) - HP | ars 


i=l j=l 


AY 


3 |» -D5@i)| - 0 [x- 968) |+r-s 
i=l 


j=l 


= ee: a; | WE By\|+r—s 
i=l j=l 

= lex) — SoLFix! 
i=l j=l 


= A(x). 


Hence, in both cases, the formulas of Theorems 2.4 and 2.5 reduce to Legendre’s 
one. 


3 Integrality of the Coefficients of g-Coordinates 


3.1 A Glimpse of Dwork’s Result 


Consider the power series 


[ee 


(Q1)n +++ (rn 
F _ n 
y d (Bin every (Bs)n cia 


(oe) 


(Q1)n- ++ (rn E s 
: ~ (Bip -- (Bon Ay; = He. a 
- X (B1)n--- (Bs)n d, () 2 Bj (n) |] z 


where, for n € N and x € Q \ Zeo, we set Hy(n) = 2. + : 


412 E. Delaygue 


Then G(z)+log(z) F(z) is annihilated by the hypergeometric operator & if there 
are at least two 1’s in B. The g-coordinate is 


(<< + a) (2) 
q(z) = exp (| ———__=——_} = zex . 


F(z) F@ 


A consequence of a lemma of Dieudonné and Dwork is that, for every prime p, 
we have 


G G 
q(z) € Zpllz]] — pe) — PZ) € pZpLlzl. 


Let p be a prime not dividing d and write F(z) (resp. G1(z)) for F(z) (resp. 
G(z)) with the substitutions 


a<m> (Dp(@1), tees Dp(ar)) and B <> (Dp (B1), sees D p(Bs))- 


Then Dwork proved in [7] the following. Assume that r = s, forall 2 € N, a, (Bi) € 
Zp , plus some fundamental but hard to read interlacing conditions (depending on 
p) on elements of « and 8. Then we have 


G G 
ae — PZ) € pZpllzil- 


In particular, if Dp induces a permutation on a and B, which is the case for 
factorial ratios, then F;} = F, Gj = G and Dwork’s result yields 


G G 
Fe) a PZ € pZp[Iz]], 


so that g(z) € Zp[[z]]. 


3.2. Factorial Ratios 


If the interlacing conditions hold for every (explicitly) large enough primes p, then 
q(z) is N-integral. Methods for the remaining primes were developed by Lian- Yau 
[10], Zudilin [11], Krattenthaler—-Rivoal [8] for infinite families of factorial ratios, 
yielding proofs of g(Cz) € Z[[z]] where C is the Eisenstein constant of F(z). 

In the case of factorial ratios, we have 


[o,@) u 


(en)!--- (eyn)! v 
¢ = Tf.n\l...(F nt i Hen — “Hein n 
© Lo Fim Gan} mF i df fin |Z 
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and 
A(x) = Yo lex] — SOL sx]. 
i=1 j=! 


We gave a criterion for the integrality of the Taylor coefficients of g(z) in 2012 


(see [4]). 


Theorem 3.1 /f F(z) is N-integral with Eisenstein constant C, then the following 
assertions are equivalent. 


(i) q(z) is N-integral; 
(ii) q(Cz) € ZAl[z]]; 
(iii) we have |e| = |f| and, for all x € [1/M, 1), we have A(x) = 1, where M is 
the largest element in e and f. 


The proof of (iii) = (i) is essentially a consequence of Dwork’s results. 
Legendre’s formula and Landau’s functions play an important role in the proof 
of Theorem 3.1. When F(z) is the generating series of multisums of binomial 
coefficients (such as Apéry numbers), it seems impossible to apply an analog of the 
proof of Theorem 3.1. To prove the integrality of the coefficients of the associated 
q-coordinate, we prove a generalization of Theorem 3.1 to several variables and 
then we specialize the multivariate g-coordinates. 


3.3 Factorial Ratios of Linear Forms 


Let e = (e;,...,e,) and f = (f,...,f,) be tuples of nonzero vectors in NZ, 
Consider 


tos eee 


rr (fj -n)!---(f) -n)! 
For every k € {1,..., d}, write 
U 


(e;-m)!---(e,-m!(< 
Gx(z) = 2 eeeabece tae a Ae is Soe Hen — re ie Zz, 
i=l 


-p!---(f,-m! 
ona (f; - n)! (f, - n)! = 


where el is the k-th component of e;. The g-coordinates are 


qk(Z) = Zk exp (a2) , L<k<n. 
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The associated Landau function is 
u Vv 
A(x) = Dilei-x] — UL; -x], eR"). 
i=1 j=l 


The non-trivial zone for A is defined by 
Qi= {x € [0, 1j¢ : there is d in e or f such that d- x > 1}. 
Observe that if x belongs to [0, 1)4 \ ZY, then we have A(x) = 0. We proved in [5] 


the following criterion. 


Theorem 3.2 Assume that F(z) € Z{[z]]. Then the following assertions are 
equivalent: 


(i) For every k, we have qx(z) € Z[{[z]]; 
(ii) we have |e| = | f | and, for every x € J, A(x) > 1. 


To apply Theorem 3.2 to the case of Apéry numbers (associated with ¢(3)), we 
consider the bivariate power series 


2 12 
F(x, y)= > ceil gy 


14512 
Ny n2: 
nj,n2>0 ! 2 


and 


(2n1 + n2)!? 
Ga(x, y= x 4.2 (2Hon,+n2 = 2H lay: 
nj,n2>0 peste 


In this case, we have 

A(x, y) = 22x + y] —4|x] — 2Ly]. 
We have 

9 ={(x,y) €[0, 1)? : 2x+y> 1} 


and if x € Y, then A(x) > 2. Hence we have q2(x, y) € Z[[x, y]] by Theorem 3.2. 
Taking x = y yields 


G(x, 
qu(x, x) = exp (ee) € Z[Lx]], 
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where 


rane ZE() CH)» 


n=0 k=0 


and 


k 
Gr(x,x) = le ) eo ) (2H ype — 2H ye)", 


n=1 k=0 


as expected. 


3.4 Generalized Hypergeometric q-Coordinates 


In this section, we briefly comment analog results in the (univariate) general case. 
Write m(a) for the smallest element in ({aqa,,...,aa,, aB,,..., aBs}, <). We 
consider the following assertion, denoted H: For all a coprime to d, | < a < d, for 
all x € R satisfying m(a) < x ~ a, we have &,(x) > 1. We consider a product of 
q-coordinates whose N-integrality is strongly related to the one of g(z): 


d 


q(z) = I] Face), (apy (Z)- 


a=1,ged(a,d)=1 


Then we proved in [6] the following criterion. 

Theorem 3.3 Assume that & is irreducible and that F(z) is N-integral. Then 
(i) ifr =s and Assertion H holds, then q(z) is N-integral. 

Furthermore, the following assertions are equivalent: 


(ii) q(z) is N-integral; 
(iii) G(z) is N-integral and G(z) = q(z)?™. 


3.5 A Brief Overview of the p-Adic Strategy 


The first step is to reduce the problem for each prime by the following classical 
result: if x € Q, then x € Zif and only if x € Zp, for all primes p. 

Then we get ride of the exponential by applying the lemma of Dieudonné and 
Dwork. 
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Lemma 3.4 
G G G 
Zexp (F2) € Zpllz]] — > Fe) = P=) € pzZp[lz]]. 


Then, in all proofs, one has to generalize a theorem on formal congruences of 
Dwork to prove that 


Fy-1(2?) F(z) = F(z?) Fs(z) mod p*Z,[[z]], (Vs = 1), 


S$] 
where F(z) = 7?) anz" and F(z) = oo.g anz". 
The last main step is to prove congruences for harmonic numbers Hy (7) and the 
p-adic Gamma function. 
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Appell—Lauricella Hypergeometric M®) 
Functions over Finite Fields, and a New ce 
Cubic Transformation Formula 


Sharon Frechette, Holly Swisher, and Fang-Ting Tu 


Abstract We define a finite-field version of Appell—Lauricella hypergeometric 
functions built from period functions in several variables, paralleling the 
development by Fuselier et al. (Hypergeometric functions over finite fields, 
arXiv:1510.02575v2) in the single variable case. We develop geometric connections 
between these functions and the family of generalized Picard curves. In our main 
result, we use finite-field Appell—Lauricella functions to establish a finite-field 
analogue of Koike and Shiga’s cubic transformation (Koike and Shiga, J. Number 
Theory 124:123-141, 2007) for the Appell hypergeometric function F, proving a 
conjecture of Ling Long. We also prove a finite field analogue of Gauss’ quadratic 
arithmetic geometric mean. We use our multivariable period functions to construct 
formulas for the number of F,,-points on the generalized Picard curves. Lastly, 
we give some transformation and reduction formulas for the period functions, and 
consequently for the finite-field Appell—Lauricella functions. 


1 Cubic Transformation Formulas 


Classical hypergeometric functions are among the most versatile of all special func- 
tions. These functions and their finite-field analogues have numerous applications 
in number theory and geometry. For instance, finite-field hypergeometric functions 
play a role in proving congruences and supercongruences, they count points modulo 
p over algebraic varieties and affine hypersurfaces, and in certain instances they 
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provide formulas for the Fourier coefficients of modular forms. We define finite-field 
hypergeometric functions ie in several variables, as an analogue of the classical 
Appell—Lauricella hypergeometric functions of type D. Lauricella’s series of type D 
[15] give a natural generalization of Appell’s F; functions [1, 2] to n variables and 
are closely related to generalized Picard curves [18]. Following the literature, we 
refer to these generalizations as Appell—Lauricella functions. For a comprehensive 
survey of Appell—Lauricella functions, we refer the reader to the article by Schlosser 
[19], and to the monograph by Slater [20]. Furthermore, we note that classical 
hypergeometric functions as well as Appell—Lauricella functions are examples of 
a more general class called A-hypergeometric functions introduced and studied by 
Gelfand et al. [10], and further studied by Beukers [4]. 

We develop the theory of these 1 finite field hypergeometric functions in 
several variables, with a focus on their geometric connections to the generalized 
Picard curves. This parallels the construction (by the second and third authors et al.) 
in [9], categorizing the interplay between classical and finite-field hypergeometric 
functions in the single-variable setting. 

Our results are motivated by a conjecture of Ling Long, related to identities 
proved by Koike and Shiga [13, 14]. In [13], Koike and Shiga applied Appell’s Fi 
hypergeometric function in two variables to establish a new three-term arithmetic 
geometric mean result (AGM), related to Picard modular forms. As a consequence 
of this cubic AGM, Koike and Shiga proved the following cubic transformation for 
Appell’s F;-function. Let x, y € C, and let w be a primitive cubic root of unity. 
Then 


ca a 
Fifa; gai lj1-23, 1-9 
[Fa 5 ? | 


-. . 1 1 a l+ox+oy\° =. 
Tet al eae Il+x+y , l+x+y 
(1) 


As an application of Appell—Lauricella functions over finite fields, we prove the 
following finite-field analogue of Koike and Shiga’s transformation, as conjectured 
by Ling Long. 

Theorem 1 Let g be a prime power with q = 1 (mod 3), let w be a primitive cubic 


root of unity, and let n3 be a primitive cubic character in Fj. If, w € Fg satisfy 
1+’A+ p40, then 


2) [35 93 3 3 
ah : 1-1-1] 


3 3 
_p(2) | 735 723-13. (ete) (ies ten) 
D iz ; haa : ae ee . 
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When A = pL, we have the following corollary. 


Corollary 1 For q¢ a prime power with q = 1 (mod 3), and w as above, if. € Fg 
satisfies 1 + 2 4 0, then 


2 2 ¥ 1—a 7 
oF, |? 3» 1-23| = oF, |. 
€ € 1+ 2x 


The result of Corollary | was first established in [9], using a different method of 
proof. It is a finite-field version of the cubic transformation 


12 3 12 fiag he 
F,}33; 1-x3|= F, | 33; {| —— 2 
284: | an Hs (EE) | (2) 


proved by Borwein and Borwein [5, 6] for x € R with O < x < 1, as a cubic 
analogue of Gauss’ quadratic AGM. 


2 Quadratic Transformations: Revisiting Gauss’ Quadratic 
AGM 


In [9], the authors give a dictionary for the correspondence between results on 
classical hypergeometric functions and finite-field hypergeometric functions. Given 
a transformation for classical hypergeometric functions, this dictionary can be used 
to predict the form of the analogous transformation for finite-field hypergeometric 
functions. They also use a calculus-style method of converting the proofs of classical 
identities to the finite-field setting, provided the classical identity satisfies the 
following condition: It can be proved using only the binomial theorem, the reflection 
and multiplication formulas [for the gamma function], or their corollaries (such as 
the Pfaff-Saalschiitz formula) [9]. 

We illustrate this calculus-style method of translating classical results by proving 
the following theorem. 


Theorem 2 The quadratic arithmetic-geometric mean of Gauss, given for x € C 


by 
i 4 2 eae pee ae 
Fi | 2? 2)1—x7| = Fi | 2° 2 3 
of i 2] | (=) (3) 


can be proved using only the binomial theorem, the reflection and duplication 
formulas for the gamma function, and special evaluations of the 3 F, and 2 F\ 
Junctions, using the Pfaff-Saalschiitz formula and Gauss’ formula, respectively. 
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As a consequence, translating this alternate proof of Gauss’ quadratic AGM 
and analyzing the associated error terms on the finite-field side, we also obtain the 
following corollary. 


Corollary 2 Let p = 1 (mod 4) be prime, let ¢ be the quadratic character in Fj, 
and let ¢ be the trivial character in F . If 4 € F p satisfies 1+ 2 4 0, then 


6), .2]_ $ | (1-4\)* 
| ‘)-2]= "| 7 (=) } (4) 


3 Connections to Picard Curves 


Taking the approach used in [9], our finite-field Appell—Lauricella hypergeometric 
functions are defined as normalizations of finite-field period functions p®, which 
we also define. These period functions are naturally related to periods of the 
generalized Picard curves 


CII Sy = xh — x) (1 — ayy. (1 — an), (5) 


defined for distinct complex numbers Aj,...,4, #4 0,1 and positive integers 
N,i, j,ki,...,k, that satisfy the conditions gcd(N, i, j,k1,...,kn) = 1 and 
Ntitjtkh+---+ ky. As a consequence, the pw functions are ideally 
suited for counting F,-points on Picard curves. We prove a theorem which gives 
the number of F,-points on the generalized Picard curves in a simple, elegant 
formula. This is analogous to the point-counting result for the generalized Legendre 
curves that was established by the second and third authors, et al. in [7, 8]. We also 
compute the genus of the generalized Picard curves C ie al following methods 
of Archinard [3]. 


4 Transformation and Reduction Formulas 


Transformation and reduction formulas for classical hypergeometric functions have 
been successfully translated to the finite-field setting, first by Greene and also 
by authors such as McCarthy, and Fuselier et al. (See [9, 11, 17] for details.) 
Transformation formulas for classical Appell—Lauricella hypergeometric functions, 
many of which can be found in the monograph by Slater [20] or the survey paper of 
Schlosser [19], may be translated into the finite-field setting using the same methods. 
We carry out this process, proving several identities for the period functions Pe 
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and hypergeometric functions Fw . Among other things, these include a finite-field 
analogue of the Pfaff-Kummer transformation, 
x y 
x-l’ y-1]’ 


F [«: bi, by: c 


Xx, y = (l—x)7"! (l-y) Fy Je-ai bi, bz; ¢ 


and Euler’s transformation, 


Fi [«: by, ba; c|x. y| = (l— x)! (1-7? Fy [e-a c— bi — bo, bo; |x, : = ; 
—y 
which hold for all a, b}, b2, c € C and all x, y for which the series are defined. 

We note that another version of finite-field Appell—Lauricella functions is 
independently defined by He [12] and Li et al. [16], which closely follows Greene’s 
definition. For their version, they establish several degree | transformation and 
reduction formulas, including some that are analogous to the identities we prove. 
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Sequences, Modular Forms and Cellular ®) 
Integrals ate 


Dermot McCarthy, Robert Osburn, and Armin Straub 


Abstract It is well-known that the Apéry sequences which arise in the irrationality 
proofs for ¢ (2) and ¢ (3) satisfy many intriguing arithmetic properties and are related 
to the pth Fourier coefficients of modular forms. Here, we briefly indicate that the 
connection to modular forms persists for sequences associated to Brown’s cellular 
integrals and state a general conjecture concerning supercongruences. 


1 Introduction and Statement of Results 


Recently, Brown [4] introduced a program where period integrals on the moduli 
space .@,n of curves of genus 0 with N marked points play a central role in 
understanding irrationality proofs of values of the Riemann zeta function. The main 
idea of [4] is to associate a rational function f, and a differential (NV — 3)-form w, 
to a given permutation o = oy on {1,2,..., N}. Consider the cellular integral 


Ig (n) =) fe @o, 
Sn 


where 


Sw = {(t,...,ty-3) «RYH :0<4 <...< ty-3 < I}. 
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For “convergent” o, the integral J, (n) converges and, for n = 0, we obtain the 
cell-zeta values ¢,(N — 3) = I, (0) studied in [5], which are multiple zeta values 
of weight N — 3. More generally, Brown [4] showed that J,(n) is a Q-linear 
combination of multiple zeta values of weight less than or equal to N — 3. If this 
linear combination is of the form A, (n)f(N — 3), for some rational A, (n), plus 
a combination of multiple zeta values of weight less than N — 3, then we say that 
Ag (n) is the leading coefficient of the cellular integral J, (n). 

This construction recovers Beukers’ integrals [3] which appear in the irrationality 
proofs of ¢(2) and ¢(3) and the Apéry numbers 


m= SNE m= EG) Cr) 


k=0 


as leading coefficients. This framework also raises some natural questions. Is there 
an analogue of the results in [1] and [2] to higher weight modular forms? Do the 
leading coefficients A, (7) satisfy supercongruences akin to those in [6]? 

In [7], we prove that there is a higher weight version of Theorem 5 in [2] and 
Theorem 3 in [1] can be extended to all odd weights greater than or equal to 3. Based 
on numerical evidence, we also conjecture that for each N > 5 and convergent on, 
the leading coefficients A, (1) satisfy 


Aoy (mp") = Agy(mp'—!) (mod p*") 


for all primes p > 5 and integers m,r > 1. 
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Some Supercongruences for Truncated 
Hypergeometric Series 


Ling Long and Ravi Ramakrishna 


®) 


Check for 
updates 


Abstract We prove various supercongruences involving truncated hypergeometric 
sums. These include a strengthened version of a conjecture of van Hamme. Our 
method is to employ various hypergeometric transformation and evaluation formu- 
lae to convert the truncated sums to quotients of J°-values. We then convert these to 
quotients of I’)-values and use Taylor’s Theorem to make p-adic approximations. 
In the cases under consideration higher order coefficients often vanish leading to the 


supercongruences. 


In [3], van Hamme conjectured that 


6 
1F6 k 25333 |.’ = = Sos pew” 
§ Pe (kK) 


= —pI,(1/3)? mod p* for p = 1 mod 6. 


We have, in [2], proved 
Theorem 1 Let p > 11. Then 


P7iiilidl 
13 § = —pI,(1/3)° mod p° for p = 1 mod 6 
6 p-l 
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and 


et 10 
1? 633: = xP Tp (1/3) mod p° for p = 5 mod 6. 
p-l 


Note these are congruences cover (almost) all primes and are stronger than the van 
Hamme Conjecture. We proved a number of other supercongruences including the 
3F> ones below: 


Theorem 2 


111 
Al! | = I,(1/3)° mod p? for p = 1 mod 6 
p-l 


and 


lil 2 
[3 ; | = — 1) (1/3)° mod p? for p = 5 mod 6. 
p=1 


For p = | mod 6 the right side of the above congruence corresponds to Dwork’s unit 
root for ordinary primes of a certain modular form that is part of the corresponding 
hypergeometric motive. 

We outline the strategies for p = | mod 6. Various minor differences (and one 
on medium-sized technical issue in Theorem 1) arise for p = 5 mod 6. The idea in 
both theorems is to perturb the entries so that the series naturally truncate at pot 
(for p = 1 mod 6). 

Let ¢3 be a primitive cube root of unity. For instance in Theorem 2 we study 


l=p 1-@p 1%} 
3Fy| 3 i ; for p = 1 mod 6. The corresponding infinite series 


truncates at pe . The Galois symmetry and a simple congruence argument imply 


lap Jog ier p 
3 Fo 3 3 3 
1 1 


At this point we can use the Pfaff-Saalschiitz formula (see, for instance, Theo- 
rem 2.2.6 of [1]) 


Ill 
w 
S 
| es | 
Lol 
mee) toes 
Re we 
a | 
as} 
ue 
jo) 
a 
i 
(es) 


F —na b = (C— a)n(c — b)n 
sie cl+ta+b—c—-n (C)\n(c —a—b)y 
dep lktp let’ 
with n = a to write 3 Fz 3 i ; as a quotient of J’-values. One can 


rewrite this as a quotient of J”,-values and then use a Taylor approximation to get 
the desired result. 
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For Theorem | a similar argument with primitive 5th roots of unity and Dougall’s 
formula below, which holds when 1 + 2a =b+c+d+e-+f, 


F als b c d e f 
ae 7 it@=-bl+ascl+e-dl+a-¢1l+e-7 


_ let I-p@— bes Dope —b—d + Iep@-—e—d + I)-F 
~ (a—b+1)-s(a—c+)-fs(a@-—d+1)_-s(a—b-—c-—d+1)-_f 


gives the van Hamme congruences mod p?. 
To obtain the congruence mod p® involves an extra argument. It is not difficult 
to show the terminating 


i 
1) 


Call this power series G(x). Using a result of Bailey relating 9 Fg expressions one 
can in fact prove the above series is in pZ Sle aAle A somewhat subtle argument is 
required when p = 5 mod 6 to obtain the divisibility of G(x) by p. 

Since p| G(x), GO) = G(p/3) mod vie It is easy to show that G(O) is 
congruent to the left side of Theorem 1 mod p°. The argument using Dougall’s 
formula gives G(p/3) is congruent to the right side of Theorem | mod p°. 


1 1 


3 4 

3 — S5x 5 — osx x ZAlx> 

ele pli": 
3 


1 


p= OFa=e 
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The Explicit Formula and a Motivic ®) 
Splitting as 


David P. Roberts 


Abstract We apply the Guinand-Weil-Mestre explicit formula to resolve two 
questions about how a certain hypergeometric motive splits into two irreducible 
motives. 


1 Introduction 


The classical explicit formula of Guinand and Weil was generalized to a broader 
context by Mestre in [2]. This formula applies to any L-function satisfying standard 
analytic properties, and gives a family of formulas for its conductor N. Mestre used 
it to get lower bounds on conductors of abelian varieties. This extended abstract 
gives an example of how it can be used in more exotic motivic contexts. 

The example we pursue here has the form M = Mg @® Me, the factor motives 
being indexed by their degree. We assume that the associated L-functions really 
do have the required analytic properties, and work numerically to a precision that 
is adequate for being very confident in the assertions. Presently, we can compute 
directly with M, but not with the individual factors. We know that its conductor is 
cond(M) = 2!5 and its local L-factor at 2 is just 1. These numerics imply that one 
of M6 and Mg is tame at 2, and the other is minimally wild. Also we know the order 
of central vanishing is rank(M) = 2. This raises two questions: 


Q1: (rank(M¢), rank(Ms)) can only be (2, 0), (1, 1), or (0, 2). Which is correct? 
Q2: (cond(M¢), cond(Mg)) can only be (2°, 29) or (27, 28). Which is correct? 


The answers are given in the table at the end of this extended abstract. We provide 
enough computational details so that the reader can both reproduce our answers and 
attempt analogous calculations for other split motives. 
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2 The Motive M = Mg ® Mg 


One of the points of the talk was to illustrate how the Magma hypergeometric 
motives package by Mark Watkins lets one compute with hypergeometric motives 
of large degree. We use Magma language here as well [1], and the reader can repeat 
most computations using the free online Magma calculator. 

To obtain the motive M and its L-function L, type 


M:=HypergeometricData ( 
[1/2: i in [1..16]], [0: i in [1..16]]); 
L:=LSeries (M,1:Precision:=10,BadPrimes:=[<2,15,1>]); 


Here Magma correctly understands that M has good reduction outside of 2. The 
optional argument ensures that it has the correct data at 2 as well, that being 
conductor 2!° and local L-factor 1. Other possibilities failing badly, correctness 
of the choice <2,15,1> is confirmed by CheckFunctionalEquation (L) 
returning 0.0000000000. The command HodgeStructure (L:PHV) says that 
M has weight w = 15 with Hodge vector 


oe ch A) = Oy 1.151, 1 1; 1,6, 0, 1,1, 1; 6,1, 1, 1 


In particular M can only appear in the cohomology of varieties of dimension > 15. 
In general, if d is even and the a;’s and the 6;’s are obtained from one 
another by adding 1/2 modulo Z, then H(a, B|1) decomposes as a sum of two 
motives of specified degrees. In our case, we know a priori that M = Mg ® Mg. 
Factorization (EulerFactor(L,3)) then yields f3(x) in 2s: 
(1 — 268 - 3x + 204193 - 34x? — 1001800. 3°x? + 204193 - 3!9x4 
—268 - 33!x° + 345x9) 
(1 + 2992. x + 39116- 34x? — 7596496 - 3°x3 — 203836426 - 3!*x4 
~7596496 - 37! x? + 39116 - 3°4x® 4 2992 - 349 x7 + 3x8), 
Thus, M6 and Mg are both irreducible. Moreover Newton-over-Hodge forces the 
Hodge vector of M to decompose nicely into hg + hg with 
he : = (0, 1,0, 1,0, 1, 0,0, 0,0, 1,0, 1,0, 1,0), 
hg: = (1,0, 1,0, 1,0, 1,0, 0, 1,0, 1,0, 1,0, 1). 


Likewise, but in 30s, 8 min, and 2.5h now, 


fs SOP 1614 Ss es Oe — Fhe eo 5), 
fr(x) = (1 + 248232- 7x +--+ 7x) (1 + 667104x + --- + 7x8), 
Fir@) = C1 — 883812» 1x feo 11 ed + 344985445 4 ss + 1125, 
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Any two of the f(x) are completely different Galois-theoretically, implying that the 
two factor M; each have motivic Galois group as large as possible, namely G Spx. 

L has a functional equation with respect to s <> 16 —s. Sign (L) immediately 
returns 1, so the analytic rank r of L is even. Evaluate (L,8) takes 4s and 
returns 0.000000000, sor > 2. Evaluate (L,8:Derivative:=2) takes 14s 
and returns 7.851654518, so r = 2. The Hardy Z-function Z(t) is a vertically 
rescaled version of L(M, 8 + ti). On [0, 7] it graphs out to 


The double zero at t = 0 is visible. The next three zeros are y} © 1.93195000805, 
y2 © 3.00559765, and y3 ~ 3.61679. Note that this calculation does not give any 
hints as to the desired factorization Z(t) = Z6(t)Zg(t). In other words, we do not 
know which motive a given y; belongs to. 


3 The Explicit Formula 


Let M be a motive of odd weight w with L-function assumed to satisfy the Riemann 
hypothesis. Then its Hodge vector h, conductor N, analytic rank r, Frobenius traces 
Cpe = Te(Fr, |M), and zeros 1/2+ y,i in the upper half plane are related by log N = 


ie x raf lo 
2nrF(0)+42 ¥ F(y)+4)~ ni f FE; (1)dt+2 > cpe ceoranre log p). 
p 0 
k j>0 pe 


Here F is an allowed test function, Ej(t) = log2~7 — W((1 + j)/2 + it) with 
W(s) = ReU(s)/I'(s)), and hP~4 = hP-4, 
The standard Odlyzko test function and its Fourier transform are 


sin |x| a Am cos*(t/2) 
Foa(x) = xt-1,1) | A — |x|) cos(rx) + » Foa(t) = Geate 
Also allowed are the scaled functions F,(x) = Foa(x/logz) and their Fourier 


transforms F.(t) = (log z)Foa(t log z). 
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4 Applying the Explicit Formula to M¢ and Mg 


Computing cpe for our motive M is easily done by Magma. However, to get the 
decomposition cpe = Cs + ae even for just e = 1, we need to factor f(x), which 


we can do only for p < 11. From the factorizations above, one has c8 = 268 - 3, 


@ = (268 - 3)2 — 2(204193 .- 34), etc. The explicit formula using (F113, Fis), with 
all terms divided by log 2 for greater clarity, answers Questions | and 2: 


(Tends to 6 or 7) (Tends to 8 or 9) 

Terme Totale Termg Totalg Comments 
h 3.11142 | 3.11142 4.85928 | 4.85928 | Hodge contribution 
3 0.17011 | 3.28154 | —0.63306 | 4.22622 | Contributions 
5 —0.35472 | 2.92682 0.07245 | 4.29897 | from the successively 
7 —0.07386 | 2.85296 | —0.02836 | 4.27031 | harder factorizations 
9 —0.02269 | 2.83027 0.00183 | 4.27214 | of Frobenius 
11 0.00028 | 2.83055 | —0.00101 | 4.27114 | polynomials f, (x) 
r 2.99946 | 5.83002 2.99946 | 7.27060 | Forced! Al: (1, 1) 
YI 5.83002 1.68061 | 8.95121 | Forced! A2: (2°, 2”) 
V2 0.13610 | 5.96612 8.95121 | Forced! 
Total 6.00000 9.00000 


Terms are positive starting with the line beginning r, and so these terms must be 
associated with either Mg or Mg so as to keep (totale, totals) coordinatewise less 
than either (6, 9) or (7, 8). This forces the indicated answers. Thus, both motives 
have analytic rank 1. The prime 2 is tamely ramified in Me and minimally wildly 
ramified in Mg. 

Remarkably, the talk just described relates directly to two collaborative projects 
begun at the MATRIX Institute. The decomposition studied here is the d = 16 case 
of the sequence of decompositions mentioned in numbered sentence 2 in §4 in the 
abstract with Rodriguez Villegas. The Hodge vectors he and hg also arise for the 
L-functions denoted L16 and Lj: in the abstract with Broadhurst; conductors there 
are 1260 = 27 37-5 - 7 and 7560 = 23 - 33 - 5 - 7 respectively. 
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Hypergeometric Supercongruences ®) 


Check for 
updates 


David P. Roberts and Fernando Rodriguez Villegas 


Abstract We discuss two related principles for hypergeometric supercongruences, 
one related to accelerated convergence and the other to the vanishing of Hodge 
numbers. 


1 Introduction 


At the conference, we added two related principles to the study of supercongruences 
involving the polynomials obtained by truncating hypergeometric series. By a 
supercongruence we mean a congruence which somewhat unexpectedly remains 
valid when the prime modulus p is replaced by p’ for some integer r > 1. We call 
r the depth of the supercongruence. 

The first principle is that a supercongruence is the first instance of a sequence 
of similar supercongruences, reflecting accelerated convergence of certain Dwork 
quotients. The second is that splittings of underlying motives can be viewed as the 
conceptual source of supercongruences, with the depth of the congruence being 
governed by the vanishing of Hodge numbers. 

We present these principles here in a limited context, so that they can be seen as 
clearly as possible. Let a = (a1, ..., aq) be a length d vector of rational numbers 
in (0, 1) and let 6 = 1¢ = (1,..., 1). We assume that multiplication by any integer 
coprime to the least common multiple m of the denominators of the a;’s preserves 
the multiset {a1,..., @q} modulo Z. 
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The associated classical hypergeometric series and its p-power truncations, for 
P prime, are as follows. 


oe) p’-l 


diy WY eves (Cade x Pie. (a1 )e+++ (ade x 
F(a, 1° |t) := er eee » F(a, 1*|t) := = ge : 
k=0 k=0 
Our starting point was the list CY3 of fourteen a = (a@,..., a4) associated to 


certain families of Calabi—Yau threefolds discussed in [8]. Each has a corresponding 
normalized Hecke eigenform f = )~ anq” of weight four and trivial character. For 
each, it was conjectured in [8] that 


F(a, 1@|1) = ap mod p’, pt{map. (1) 


Some of these cases have been settled. For example, the case a = (1/5, 2/5, 3/5, 
4/5) was proved by McCarthy [6], the corresponding modular form having level 
25 [9]. Just before submitting this note, Long et al. [5] announced two different 
proofs of (1) for all fourteen cases in CY3. 


2 Convergence to the Unit Root and Hodge Gaps 


The two principles stem from observations about common behavior of the examples 
in CY3. The first observation is that each supercongruence (1) seems to be part of a 
sequence. Dwork proved [2] that for p {m 


Fi@ ll) E@II) a  ss0. (2) 
Fs(a,14|t?) — Fs—1(a@, 14 | t?) 
Moreover, the rational functions F;+1(a@, 1¢ | t)/Fs (a, 14 |¢?) converge as s > co 
to a Krasner analytic function which can be evaluated at a Teichmiiller representative 
Teich(t) which is not a zero of F; giving the unit root y, of the corresponding 
local L-series at p. 

For a € CY3, computations suggest 


F,(a, 1¢|1 
rit = Yp mod ps p{map, s>0, (3) 
where yp € Zp is the root of T? —a pi+ p° not divisible by p. Note that the case 
s = | reduces to (1) since yp = ap mod ee 

Our second observation is that the appearance of a congruence to a power p> as 
opposed to the expected p* is related to Hodge theory. Consider the hypergeometric 
family of motives H(a, 14 |t) (see [1] for a computer implementation). For any 
t € P'(Q) \ {0, 1, 00} the motive H(a, 14 |r) is defined over Q, has rank d, weight 
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d — | and its only non-zero Hodge numbers are Gee. Sacky pot) =(1,...,1). 
When t = | there is a mild degeneration and the rank drops to d — 1. 

For a € CY3, the motive for t = | is the direct sum, up to semi-simplification, 
of a Tate motive Q(—1) and the motive A = M(f) of the corresponding 
Hecke eigenform f of weight four. The Hodge numbers of A are (1, 0,0, 1). 
We view the gap of three between the initial 1 and the next 1 as explaining the 
supercongruences (3). 


3 A Congruence of Depth Five 


To illustrate our two observations further, we use the decomposition established in 
[3, Cor. 2.1] for the case a = (1/2, 1/2, 1/2, 1/2, 1/2, 1/2). We learned at the 
conference that this example was recently studied further by Osburn et al. [7], who 
proved (4) below for s = 1 modulo p? and report that Mortenson conjectured it 
modulo p°. 

Again after semisimplifying, the motive H(a, 1°| 1) has a distinguished sum- 
mand isomorphic to the Tate motive Q(—2) of rank 1 and weight 4. The complement 
of this Q(—2) breaks up into two pieces A and B. They are both rank 2 motives of 
weight 5. Namely, A = M(f¢) is the motive associated to the unique normalized 
eigenform f6 = >-,51 4nq" of level 8 and weight 6 and B = M(f4)(—1) is a Tate 
twist of the motive associated to the unique normalized eigenform fy = ao ong" 
of level 8 and weight 4. The LMFDB [4] conveniently gives data on modular forms, 
including the a, and b, here. 

The trace of Frob, on the full rank 5 motive H (a, 1°| 1) is given by 


ap t+ bpp + Pp’. 
Numerically, we observe the following supercongruences 


F(a, 14] 1) 5 
———— = dp’, 2ap, >1, 4 
Foweity 7 mod p™, — pt2ap, 82 (4) 
where yp € Zp is the root of T? — apT + p> not divisible by p. 

The Hodge numbers for A and B are (1, 0, 0, 0, 0, 1) and (1, 0, 0, 1) respectively, 
with the gap of five in the Hodge numbers for A nicely matching the exponent of 
the supercongruences. 
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4 A Summarizing Conjecture 


We now state a conjecture that generalizes the situations discussed so far. 


Conjecture I For fixed t = +1, let A be the unique submotive of H(a, 1¢| 1) with 
h°-4-1(4) = 1 and let r the smallest positive integer such that h”¢-!~"(A) = 1. 
For p {m such that Fi (a, 1?@\r)e Zp , let yp be the unit root of A. Then 


F(a, 17 | t) 
d ps > I. 5 
Fata, M4 ]r) PMO — a 
In particular, for s = 1 we have 
F\(a, 1¢|t) = ap mod p’, (6) 


where dp is the trace of Frob, acting on A. 


1. For generic a, tT we expect r = 1| and (5) follows (see (2) and the subsequent 
paragraph). For the conjecture to predict r > 1, the motive has to split 


appropriately. 
2. Fora = (1/2,...,1/2) and t = (—1)¢ the motive H(a, 1“| 1) acquires an 
involution and we expect r = 2 for any d > 7; all numerical evidence is 


consistent with this assertion. 
3. For large d the unit roots involved are not in general related to classical modular 
forms since the motives A will typically have degrees greater than two. 
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and Hypergeometric Motives peels 


Charles F. Doran, Tyler L. Kelly, Adriana Salerno, Steven Sperber, 
John Voight, and Ursula Whitcher 


Abstract Mirror symmetry predicts surprising geometric correspondences between 
distinct families of algebraic varieties. In some cases, these correspondences have 
arithmetic consequences. Among the arithmetic correspondences predicted by 
mirror symmetry are correspondences between point counts over finite fields, and 
more generally between factors of their Zeta functions. In particular, we will discuss 
our results on a common factor for Zeta functions of alternate families of invertible 
polynomials. We will also explore closed formulas for the point counts for our 
alternate mirror families of K3 surfaces and their relation to their Picard—Fuchs 
equations. Finally, we will discuss how all of this relates to hypergeometric motives. 
This report summarizes work from two papers. 
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1 Motivation 


Calabi-Yau varieties—those smooth projective varieties with trivial canonical 
bundle—provide a rich and interesting source of arithmetic and geometry. Calabi— 
Yau varieties of dimension 1 are elliptic curves, ubiquitous in mathematics and 
theoretical physics. In dimensions two and above, we take our Calabi—Yau varieties 
to be simply connected. The two-dimensional Calabi—Yau varieties are better known 
as K3 surfaces, after the mathematicians Kummer, Kahler, and Kodaira and the 
mountain K2. Like elliptic curves, K3 surfaces are all diffeomorphic to each other, 
but the study of their complex and arithmetic structure remains deep. The study of 
higher dimensional Calabi—Yau varieties promise the same rewards in many areas 
of mathematics. 

It is particularly important to study Calabi-Yau varieties in families, and 
interesting families of Calabi-Yau varieties arise in several ways. Perhaps the 
simplest method of obtaining Calabi—Yau varieties is to take smooth (n + 1)- 
folds in projective space P”. A natural generalization of this construction is to 
take anticanonical hypersurfaces or complete intersections in certain toric varieties. 
Often, however, one wishes to consider subfamilies with further special properties. 
For example, a general smooth quartic in Pe has Picard rank 1, but a general member 
of the pencil of K3 surfaces given by 


a + a + e + i — 4rx0x1x2x3 = 0 
has Picard rank 19 and the Fermat quartic 
aga; a5 Fag = 0 


where y = O has Picard rank 20. As often happens, these special geometric 
properties are correlated with enhanced symmetry: each member of the pencil 
admits an action by the group (Z/4Z)’, and the Fermat quartic admits an action 
by a group of 384 elements [18, 19]. 

Calabi-Yau manifolds are also interesting from a physical perspective. Indeed, 
string theory posits that our universe consists of four space-time dimensions 
together with six extra, compact real dimensions which take the shape of a Calabi— 
Yau variety. Physicists have produced several consistent candidate theories, using 
properties of the underlying varieties. These theories are linked by dualities which 
transform physical observables described by one collection of geometric data into 
equivalent observables described by different geometric data. Attempts to build a 
mathematically consistent description of the duality between Type HA and Type 
IIB string theories led to the thriving field of mirror symmetry, which is based on 
the philosophy that the complex moduli of a given family of Calabi—Yau varieties 
should correspond to the complexified Kahler moduli of a mirror family. 
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There are several methods of constructing the mirror to a family of Calabi-Yau 
varieties. The first mirror symmetry construction, due to Greene—Plesser [15], used 
a (Z/5Z)° action on the one-parameter family of quintic threefold X w given by 


ee + x + xe + x + x — 5Wrx9x1x2x3x4 = O 


to construct the mirror family Yy to all smooth quintic hypersurfaces in P+ A 
directly analogous construction can be used to find the mirrors to families of 
Calabi—Yau hypersurfaces in weighted projective spaces. Batyrev gave combina- 
torial methods for constructing mirror families to Calabi-Yau varieties realized 
as hypersurfaces or complete intersection in toric varieties [1]. Though powerful, 
Batyrev’s construction relates families rather than individual varieties. In the current 
work, we use an alternative generalization of the Greene—Plesser construction due 
to Berglund—Hiibsch—Krawitz [2, 17], allowing for a direct comparison of varieties 
on either side of the mirror correspondence. 

When individual pairs of mirror varieties can be identified, mirror symmetry 
constructions have implications for their arithmetic and geometric structure. These 
implications were first explored by Candelas—de la Ossa—Rodriguez- Villegas [4] for 
their zeta functions, the generating function for the number of F pr-valued points 


r= : 

for a variety X over Fp; we have Z(X,T) € Q(T) by a theorem of Dwork [10]. 
These authors used the Greene—Plesser mirror construction and techniques from 
toric varieties to compare the zeta function of fibers of the diagonal Fermat pencil 
of threefold Xy, and the mirror pencil of threefold Yy [3-5]. They found that for 
general y, the zeta functions of Xy and Yy share a common factor R(T, y). This 
common factor is related to the period of the holomorphic form on Xy,, and the 
number of points on Xy over a finite field is given by a truncation of a generalized 
hypergeometric function which solves the Picard—Fuchs equation associated to the 
holomorphic form. Furthermore, the other nontrivial factors of Z(Xy,7T) were 
closely related to the action of (Z/5Z)* on homogeneous monomials. 

The Greene—Plesser construction generalizes easily to smooth hypersurfaces of 
degree n + | in P”. Wan [20] has characterized the relationship between a member 
Xy of the diagonal Fermat pencil in P” and its mirror Yy, in terms of point counts 
via the congruence 


#Xy (Fy) = #Y,(F,) (mod q) 
for all gq = p’ such that F, > F,(w). Fu-Wan [12] generalized this result to 


other pairs of mirror pencils. More recently, Kloosterman [16] showed that one 
can use a group action to describe the distinct factors of the zeta function for 
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any one-parameter monomial deformation of a diagonal hypersurface in weighted 
projective space. 

In our work, we take a slightly different approach. Rather than relating a pencil 
of Calabi-Yau varieties to its mirror, we instead consider those pencils whose 
mirrors are related in some geometric way. In other words, we seek to understand 
when common properties of mirrors translate into arithmetic, geometric, or physical 
implications for the original pencils themselves. 

There is an intricate relationship between Picard—Fuchs equations and the zeta 
function, mediated by the action of the Frobenius map. Given a set of symmetric 
pencils in P” which yield alternate mirrors to smooth n 4+ 1-folds in P”, we 
hypothesize that the zeta functions of the members of each pencil and their mirror 
should share a common factor, corresponding to the Picard—Fuchs equation satisfied 
by the holomorphic form. In the current work, we apply the formalism of Berglund— 
Hiibsch—Krawitz mirror symmetry to characterize appropriate symmetric pencils, 
and we study the resulting zeta functions. 

We have followed four approaches, exploiting algebraic, geometric, and arith- 
metic properties of highly symmetric pencils. 


2 Common Factor Theorem 


Our first result, described in more detail in [8], is that invertible pencils whose 
mirrors have common properties share arithmetic similarities as well. Revisiting 
work of Gahrs [14], we find that invertible pencils whose BHK mirrors are 
hypersurfaces in quotients of the same weighted-projective space have the same 
Picard—Fuchs equation associated to their holomorphic form. In turn, we show that 
the Picard—Fuchs equations for the pencil dictate a factor of the zeta functions of the 
pencil. 
An invertible polynomial is a polynomial 


n n 
Fx = > his. € Z[xo,.--,Xnl, 


i=0 j=0 


where A = (qj;)ji,; 1s an (2 + 1) x (n + 1) is a matrix with nonnegative integer 
entries, such that: 


e det(A) 40, 
¢ the polynomial F’4 is homogeneous of degree n + 1, and 
* the function F,4 : C’+! — C has exactly one singular point at the origin. 


We further impose that these hypersurfaces are Calabi—Yau varieties, so the 
degree of the polynomial F’4 isn + 1. 
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Inspired by Berglund-Htibsch-Krawitz (BHK) mirror symmetry, we look at the 
weights of the transposed polynomial 


n n 


Far = [2% 


i=0 j=0 


which will be a quasihomogeneous polynomial, i.e., there exist nonnegative integral 
weights go,..-,n So that gcd(go,...,q@n) = 1 and Fyr defines a hypersurface 
X ,4r in the weighted-projective space WP” (qo, ..., dn). We call go, ..., dn the dual 
weights of F4. Letd? = >>; gi be the sum of the weights. 

Using the dual weights, we define a one-parameter deformation of our invertible 
pencil. Consider the polynomials 


non 
Fay = = Gee = d’ Wxo- “Xn € ZL \Lxo, agin: 


i=0 j=0 


We then have a family of hypersurfaces X4,y := Z(Fa,y) C P” in the parameter 
w, which we call an invertible pencil. 

The Picard—Fuchs equation for the family X,4,\4 is determined completely by 
the dual weights by work of Gahrs [14, Theorem 3.6]. Indeed, Gahrs computes the 
order of the Picard—Fuchs equation in terms of the g;. There is an explicit formula 
for the order D(q) of the Picard—Fuchs equation that depends solely on the (n + 1)- 
tuple of dual weights q = (qo, ..., dn). The Picard—Fuchs equation itself depends 
solely on q as well. To be precise, we observe that the Picard—Fuchs equation is a 
hypergeometric differential equation whose motive descends to Q. 

For a smooth projective hypersurface X in P”, the zeta function is of the form 


Py(T)-" 
7007) = ses 
(eT gt soa g 2) 
with Py(T) € Q[T]. Our main result exhibits a (fiber-wise) common factor of the 
zeta function in the general setting suggested above. 


Theorem 1 Let X4\y and Xg,y be invertible pencils of Calabi-Yau n — 1-folds in 
PP", determined by integer matrices A and B, respectively. Suppose A and B have 
the same dual weights q;. Then for each wy € Fg such that the fibers X44, and X By 
are smooth and gcd(q, (n + 1)d’) = 1, the polynomials Pxay (T) and Pxegy (T) 
have a common factor Ry(T) € Q(T] with deg Ry(T) = D(q). 
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3 Explicit Computations and Hypergeometric Motives 


We next focus our attention on invertible families of K3 surfaces, with dual weights 
(1, 1, 1, 1), which are as follows: 

In [9] we analyze the zeta functions of the families given in Table 1. Using 
a classical viewpoint, we find that the hypergeometricity of the Picard—Fuchs 
equations associated to the five families predicts a motivic decomposition of the 
point counts over finite fields for our families. We see that the hypergeometric 
Picard—Fuchs equations for the primitive middle cohomology of the five families 
correspond to nontrivial hypergeometric summands in the point counts over finite 
fields. The core of this paper is the following theorem: 


Theorem2 Leto ¢ ¥ = {F4, Folo, FyLb;, Lolo, Ly} signify one of the five K3 
families in Table |. There is a canonical decomposition of the finite field point count 
for Ng, (Xo,y) whose summands are either trivial or hypergeometric. Moreover, 
there exists an element in |: bole (Xo, y) that satisfies a hypergeometric Picard—Fuchs 
differential equation with parameters a), ..., On; Bi, --., Bn—1 if and only if there 
exists a nontrivial summand in the canonical finite field point count Np, (Xo,y) 
corresponding to the hypergeometric function defined over Fg with parameters 
O1,---,0n; Bi,.--, Bn-1- 


This proof is done explicitly. First, we find the Picard—Fuchs equations via 
the diagrammatic method introduced in [4, 5] and fully developed in [7]. After 
establishing the hypergeometric forms of the Picard—Fuchs equations, we confirm 
that they do indeed correspond to those in the finite point counts using Gauss sums, 
using a classical method due to Delsarte [6] and Furtado Gomida [13]. 

Additionally, we obtain finer information by factoring the polynomial Q. y(T) 
in Theorem | further, giving a complete hypergeometric decomposition. Our result 
is as follows. 


Table 1 The symmetric quartic K3 pencils with dual weights (1,1,1,1) 


Quartic Family Symmetries 
F4 xg taf tax9 +x4 - 4yx0x1392%3 = 0 (Z/4Z)" 
F,L3 x¢ xPx2 { x3x3 { Bx] 4wxox1x2x3 = 0 Z/[1Z 
Fol x¢ xt x3.x3 { x3X2 4wWxox1x2x3 = 0 Z,[8Z 
Lolo |x3x xXx x3x3 x3Xx9 4wxgx1x2x3 = 0 Z/4Z. 
L4 xex x2x9 x33 x3x0 4Wx0x1x2x3 = 0 Z/[SZ 
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Corollary 1 The polynomials Q5,y,q(T) factor over Z[T | according to the follow- 
ing table. 


Family Factorization 7 Hypothe sis |r 


Fy | (deg 1)!* (deg 2)? gq =1mod4 | 2 
Fol. |(1 —qT)®°(deg 1)*(deg 2)?}g = 1 mod8 | 4 (1) 
FiL3 |(deg 6)° gq = 1 mod 28/12 
LoL. |(1 —qT)*(deg 2)! (deg 4)7\q = 1 mod 4 | 2 
La (1 — qT)* (deg 4)* q = 1 mod 20/10 


In the above table, there may be further factorization depending on w and q, 
and some of these factors may agree. The integer r in above table is such that for 
q = p’, we have Oo yg(T) = A —- qT)'® under the hypotheses of Theorem A: 
in other words, if we factor Qo,y,q(T) as a product of cyclotomic polynomials ¢m,; , 
then lcm(m;) | r. 

The case of the Dwork pencil F4 is due to Dwork [11, §6j, p. 73], and in this 
case we know that the degree 2 factor occurs with multiplicity 3 and the linear 
factor occurs with multiplicity 12, as the notation indicates. The factorization in 
Corollary 1 is motivated by similar work due to Candelas—de la Ossa—Rodriguez- 
Villegas [4, 5]. (Kloosterman [16] has shown that one can use a group action to 
describe the distinct factors of the zeta function for any one-parameter monomial 
deformation of a diagonal hypersurface in weighted projective space; only one of 
our families, the Dwork pencil, fits within the scope of this work.) 
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Schwarzian Equations and Equivariant ®) 
Functions cha 


Abdellah Sebbar 


Abstract In this review article we show how the theory of Schwarzian differential 
equations leads to an interesting class of meromorphic functions on the upper-half 
plane H named equivariant functions. These functions have the property that their 
Schwarz derivatives are weight 4 automorphic forms for a discrete subgroup I" of 
PSL2(R). It turns out that these functions must satisfy the relation 


fivt) = ef), teH, yell, 


where p is a 2-dimensional complex representation of J” and the matrix action 
on both sides is by linear fractional transformation. When p is the identity 
representation p(y) = y, the equivariant functions are parameterized by scalar 
automorphic forms, while if o is an arbitrary representation they are parameterized 
by vector-valued automorphic forms with multiplier o. If I” is a modular subgroup 
we obtain important applications to modular forms for I” as well as a description 
in terms of elliptic functions theory. We also prove the existence of equivariant 
functions for the most general case by constructing a vector bundle attached to the 
data I’, ¢) and applying the Kodaira vanishing theorem. 


1 The Schwarz Derivative 


Let D be a domain in C and f a meromorphic function on D. The Schwarz 
derivative or the Schwarzian of f is defined by 


eee ae oi (= LL) 
uar=(BY-1(6Y -5-2(6). 
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It was named after Schwarz by Cayley, however, Schwarz himself pointed out 
that it was discovered by Lagrange in 1781 [10]. The Schwarz derivative has 
many interesting properties which are given below. The functions involved are 
meromorphic functions on a domain D. 


¢ Projective invariance: 
et ={f,z},a,b,c,de€C, ad—bc #0. 


¢ Cocycle property: If w is a function of z, then 


{fz} = (f, w}(dw/dz)? + {w, z}. 


‘ : az+b 
e¢ {f,z} = Oif and only if f(z) = forsomea, b, c,d €C. 
b b oe 
az+ . a 
Ifw= cea with (: ‘) e GL2(C), then 
Ce ae 
,>z} = {f, w=} ————. 
(cz + d)* 


¢ Fortwo meromorphic functions f and g on D, 


_ ; _ agz)+b (ab 
{f, z} = {g, z} if and only if f(z) = Ota eae (: € GL2(C) 


¢ If w(z) is a function of z with w’(zo) 4 O for some zo € D, then in a 
neighborhood of zo, we have 


{z,w} = {w, 2} (dz/dw)’. 
Some of the properties are elementary and the rest follows from the following 
important connection with the theory of ordinary differential equations: 


Let R(z) be a meromorphic function on D and consider the second order 
differential equation 


1 
y+ xR@y =0 


with two linearly independent solutions y; and y2. Then f = y;/y2 is a solution to 
the Schwarz differential equation 


(f, 2} = RG). 
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Conversely, if f(z) is locally univalent and { f, z} = R(z), then y) = f/,/f’ and 
1 
y2 = 1/,/ f’ are two linearly independent solutions to y” + —R(z)y = 0. 


The Schwarz derivative plays an important role in the study of the complex 
projective line, univalent functions, conformal mapping, Teichmuller spaces and 
most importantly in the theory of modular forms and hypergeometric functions 
[1, 4, 6, 8, 9, 11]. 

We now look at the effect of the Schwarz derivative on automorphic functions for 
a discrete subgroup I” of PSL2(Z), that is a Fuchsian group of the first kind acting 
on the upper half-plane H = {rt € C | S(t) > 0} by linear fractional transformation 


Proposition 1.1 ((8]) Jf f is an automorphic function for a discrete group I, then 
{ f, tT} is a weight 4 automorphic form for T° that is holomorphic everywhere except 
at the points where f has a multiple zero or a multiple pole (including at the cusps). 
Moreover, if I" is of genus zero and f is a Hauptmodul, then { f, t} is modular for 
the normalizer of I’ in PSL2(R) 


As an example, let 4 be the Klein modular function for "(2) given by 


8 
es (ue) | 
n(2z) 


where 7 is the Dedekind eta-function given by 


n(t) = q™ [](—4"), q =expQmiz), 


n>1 
then 
2 
8 
A, T} = —Ex4(t), 
2 
where £4 is the weight 4 Eisenstein series 


E4(t) = 1+240 9° o3(n)q", 


n>1 


with ox (1) being the sum of the k-th powers of the positive divisors of n. 
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If ” = I0(8) and we consider the Hauptmodul fg for I” given by 


n(4r)!? 


BO = TenmeoF 


then 
1 Lad 42 
Fyfe 7} = rage +64)" = Op,@d,(2T), 


where 6p,@p,(2T) is the theta function of two copies of the root lattice D4 and 63 
and 64 are the Jacobi theta-functions 


@3(t) = >> gq?” , O4(t) = D> 1)"g?” 


neZ neZ 


For later use, we also give 


62(t) = ss git ; 


neZ 


Finally, if 7” = (5) and fs is the Hauptmodul given by 


f(t) = q [Ja-a’y® 


n>1 


where (=) is the Legendre symbol, then 


1 
Iya lf tT} = OQg(1) 


where Qg(1) is the Icosian or Maass lattice which is the 8-dimensional 5- 
unimodular lattice with determinant 625 and minimal norm 4. 

Notice that in the above three examples, the Schwarz derivatives are all holomor- 
phic as the groups involved are torsion-free and thus their Hauptmoduls do not have 
multiple zeros or poles. 

One may ask if the converse of the above properties is true: Suppose that 
the Schwarz derivative {f,t} of a meromorphic function on H is a weight 4 
automorphic, what can be said about f? Does it have any automorphic properties? 
The following sections will be devoted to elucidate this question. 
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2 Equivariant Functions 


Suppose that F is a weight 4 automorphic form for a discrete group I” and f is a 
meromorphic function on Hi such that { f, 7} = F(t). Then using the properties of 


; A ‘ re a 
the Schwarz derivative from the previous section we have, for y = d el, 
Cc 


4 = at+b 
(ct +4)‘ F(t) = (==) 


_l; | at+b 
~ |’ Vertd)’?ct+d 


= 4 at+b 
= (ct+d) {2 (S48). 


b 
Fas {7 (S22) ef. 


AB 
Hence, there exists é ) € GL2(C) such that 


Therefore, 


on _ Af(t)+B 
a5) ~ Cf(t)+D 


This defines a 2-dimensional representation p of I” in GL2(C) such that 


f(yvt) = ev) f@) (1) 


where on both sides the action of the matrices is by linear fractional transformation. 
We will distinguish three cases: 


1. o = 1 aconstant , in which case f is an automorphic function. 

2. pe = Id, the embedding of I in GL2(C) or the defining representation of I’, 
providing a meromorphic function commuting with the action of J” which we 
call an equivariant function for I’. 

3. o is a general representation not equal to one in the above cases giving a function 
f called a p-equivariant function for I’. 


We will be interested in the last two cases. A trivial example of an equivariant 
function for a discrete group I” is f(t) = t. We will see in the next section 
that there are infinitely many examples parametrized by automorphic forms for I’. 
Furthermore, the set of equivariant functions will have a structure of an infinite 
dimensional vector space isomorphic to the space of meromorphic sections of the 
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canonical bundle of the compact Riemann surface X (7) = (”\H])* where the star 
indicates that we have added the cusps to the quotient space, in other words, the 
space of meromorphic differential forms on X (J”). 

In the general case, we establish the existence of p-equivariant functions for an 
arbitrary representation p of I”. This will include the case when p : I —> C* is 
a character. Of course, when this character is unitary, then we recover the classical 
automorphic functions with a character. 


3 The Automorphic and Modular Aspects 


In this section we focus solely on the equivariant functions, that is when p is the 
defining representation. We have already seen that f(t) = T is equivariant for 
every discrete group. It turns out that there are many more nontrivial equivariant 
functions. 


Theorem 3.1 ((5]) Let I’ be a discrete group. We have 


1. Let f a nonzero automorphic form of weight k for I’ (even with a character), 
then 


f(t) 
h(t): ach Areas 5 (2) 
is an equivariant function for I’. 

2. Let h be an equivariant function for I’. Then h(t) = h(t) for some 
automorphic form with a character for I" if and only if the poles of 1/(h(t) — T) 
are simple with rational residues. Moreover, if I has genus 0, then we can omit 
the character from this statement. 


If k is a nonzero integer and c is a nonzero constant, then h¢ = A pk = he and 
so the correspondence f +> hy is not one-to-one. Because of the and part of the 
theorem, an equivariant functions that arises from an automorphic form as in (2) is 
called a rational equivariant form. It turns out that for such an equivariant function 
h(t), the residues of 1/(i(t) — tT) have bounded denominators, and any common 
multiple of these denominators can be the weight for an automorphic form f such 
that h = hy. Moreover, not all equivariant functions are rational. An example of a 
non-rational equivariant function is given by 


E4(t) 


MOSES ATG) + ES 


where £¢ is the weight six Eisenstein series 


Eo(t) = 1-504 )° os(n)q" 


n>1 
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Indeed, one can show that | /(h(t) — T) has a simple pole at the cubic root of unity, 


. : TI 
but with residue — + —. 


The following theorem provides two important applications of equivariant 
functions to modular forms. 


Theorem 3.2 ({12]) Let f be a modular form for a finite index subgroup of SL2(Z) 
of nonzero weight, then 


1. The derivative of f has infinitely many non-equivalent zeros in H, all but a finite 
number are simple zeros. 

2. The q-expansion of f, where q is the uniformizer at oo for I’, cannot have only 
a finite number of nonzero coefficients. 


This theorem follows from the properties of the equivariant function h f attached to 
f as in (2); the most important of which is that h ¢ takes always real values. This is 
clear if h ¢ has a pole in Has it will take rational values at the orbit of this pole, but if 
hy is holomorphic, then we have to apply the theorem of Denjoie-Wolfe applied to 
the iterates of h ¢. Then we prove that h ¢ has infinitely many non-equivalent poles in 
H. To prove that the zeros are all simple except for a finite number of them requires 
the use the Rankin-Cohen brackets. The second statement is usually proven using 
the L-function of the modular form, but here it is a simple consequence of the first 
statement. 

We end this section with an interesting connection with the cross-ration which is 
defined for four distinct complex numbers z;, 1 <i < 4 by 


[21, 22, 23, Z4] = (1 = 22) (4 = 23) 
— (z1 — 23)(z4 — 22) 


As it is projectively invariant, the cross-ration of four distinct equivariant functions 
for a discrete group I” is an automorphic function for ”. As examples, we have 


[t, ho, ho,, hos] =A, 


and 


1 
Neha Neel = zy SI: 
[t E4 A E¢| 1728 J 


where A = 7** is the discriminant cusp form and the Dedekind j-function is given 
by j = E}/A. 
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4 The Elliptic Aspect 


The ideas in this section first started in [3] and were developed further in [15] and 
more recently in full generalization in [2]. Let L = w|Z + a2Z be a lattice in C 
with 3@2/@, > 0. Its Weierstrass go-function is given by 


1 1 1 
(Zz) = zt a Fa 
e wel \{0} (<—@) ~ 
and the Weierstrass ¢-function is given by 
1 1 1 z 
é@=-+ D> —+-+R5. 
z Z-O OO ©@ 
w€L\ {0} 
Notice that ¢’(z) = —g(z), and while go is L-periodic, ¢ is quasi-periodic with 


respect to L in the sense that form € L and z € C, we have 

(z+) = ¢(z) + AL(@) 
where the quasi-period map depends on the Lattice L. It is Z-linear and so it is 
determined by the quasi-periods nj = Hz (q@1) and n2 = Hz,(w2). Moreover, Hz is 
homogeneous of weight —1 in the sense that if a € C*, then 

Hyt(aw) = a! Hz (0). 
The quasi-periods satisfy the Legendre relation 

@\N2 — @2n| = 277i. 

We now suppose that w; = | and w) = t € H. Using the fact that SL2(Z) acts on 


L by isomorphisms (by a change of basis) and using the homogeneity of the quasi 
period map H7, it is easy to see that 


2 
ho(t) = 2 
1 


is equivariant for SL2(Z) [3]. 
In fact, from the expression of the Weierstrass ¢-function one can prove that 


2 
IU 
=> 
m1 5 2(t) 
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where E> is the weight 2 Eisenstein series 


1 A’(t) 
E = 1-24 = —— ; 
2(t) Ye aig" = > ay 
n>1 
Therefore, using the Legendre relation, we get 
6 A(t) 
h — —_——_— = 12——_., 
a a ee 


and thus ho = /, is a rational equivariant function. 


Let us put 
t 
M, = ( ay 
In 


Then M, is invertible as det M, = —2zi by the Legendre relation. 

Let I" be a finite index subgroup of SL2(Z) and denote By Eq(J’) the set of all 
equivariant functions for SL2(Z) excluding the trivial one h(t) = t. Also denote 
by M2(I°) be the set of all weight two meromorphic modular forms for SL2(Z). We 
have 


Theorem 4.1 The map from M2(I°) to Eq(I’) 


frMef 


is a bijection where M, f is the linear fraction of f given by M;. 


The above map sends the zero modular form to Ho which is equivariant for SL2(Z) 
and hence for every subgroup. In the meantime, fg was built using the quasi-periods 
of the Weierstrass ¢-function. One might ask: what about the remaining equivariant 
functions? can they arise also from elliptic objects in the same way ho does? In 
the paper [2], this question is fully answered, and indeed for each equivariant 
function for J’, one can construct a generalizations of the Weierstrass ¢-function 
called elliptic zeta functions which are quasi-periodic maps on the set of lattices 
such the quotient of two fundamental quasi-periods is an equivariant function. The 
interesting aspect is that there is a triangular commutative correspondence between 
the set of these elliptic zeta functions, M2(J”) and Eg(I") which encompasses the 
modular and elliptic nature of equivariant functions. 

As for the geometric aspect, because the weight 2 meromorphic modular forms 
are identified with the meromorphic differential forms on the Riemann surface 
X(I’), we can thus view the equivariant functions as the global meromorphic 
sections of the canonical bundle of X (I"). 
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5 The General Case 


In this section, we consider the case of a general discrete group and an arbitrary 
representation p : [’ —>» GL2(C) and investigate the existence of p-equivariant 
functions for I”, that is, the meromorphic functions on H such that 


f(vt) = ey) f(r). 


We will denote the set of such functions by Eq(I’, ¢). Let us first recall the notion 
of vector-valued automorphic forms for the data J’, 0). A meromorphic function 
F = (fi, f2)' : H —> C? where f; and f> are tow meromorphic functions on H 
is called a 2-dimensional vector-valued automorphic form for I” of multiplier o and 
weight k € Zif 


(ct +d4)* F(yt) = p(y) F(t), 7 EH, v=(¢ | ef, 


in addition to the usual growth behavior at the cusps. Denote by Vi; (I", e) the space 
of all such forms. They were fairly studied in the last two decades by various authors 
in different contexts from algebraic, arithmetic, analytic, geometric and theoretical 
physics points of view, see [14] and the extensive list of references therein. Their 
existence is well established in the literature for a unitary representation o and for 
I being a subgroup of SL2(Z) or a genus zero discrete group among other cases. 
The existence for an arbitrary data (I, ¢) has been recently proved in [14] even for 
I” being a Fuchsian group of the second kind. 
The first result of importance to us is 


Theorem 5.1 ((13]) Let F = (fi, f2)! be a 2-dimensional vector-valued auto- 
morphic form of multiplier p and arbitrary weight for I’, then hr = f\/f2 is a 
p-equivariant function for I. 


This settles the question of the existence of p-equivariant functions which is 
then a consequence of the existence of vector valued automorphic forms. A more 
interesting result is that every p-equivariant functions arises in this way 


Theorem 5.2 ({13]) The map from Vi, p) to Eq, p) given by 


F=|f 


| > hr = filha (3) 
fa 


is surjective. 


Surprisingly, the proof uses almost all the properties of the Schwarz derivative which 
lead to the next theorem. If D is a domain in C, and R(z) is a holomorphic function 
on D, then we cannot guarantee the existence of two linearly independent global 
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solutions to the differential equation 
y” + R(z)y =0 


when D is not simply connected, and all we can hope for are local solutions. 
However, when R(z) comes from a Schwarz derivative, then we have a different 
outcome. 


Theorem 5.3 Let D be a domain and f be a meromorphic function on D such that 
R(z) = {f, z} is holomorphic on D. Then the differential equation y" + R(z)y = 0 
has two linearly independent global solutions on D. 


It is this important result and the use of the Bol identity that lead to the surjectivity 
of the map (3). 

So far we have established this close connection between p-equivariant functions 
for I” and 2-dimensional vector valued automorphic forms. All we need is to prove, 
for arbitrary data (J’, ¢), the existence of such automorphic forms. To this end we 
associate to (J", o) a vector bundle & = &p,) over X = X(J°) constructed as 
follows: 

We choose a covering Y = (U;)jcz where I is the set of cusps and elliptic 
fixed points on X. We then construct holomorphic maps w; : U; —>» GL2(C) 
having p as a factor of automorphy [7]. This is carried out by solving the Riemann- 
Hilbert problem over U; with the monodromy p. These maps yield a cocycle (Fj;) € 
(%, GL(, @)) to which is associated a rank two holomorphic vector bundle & 
over X whose transition functions are the maps F;; on U; 1 U;. 

Now if P is a given point (that can be a cusp) and is the line bundle over X 
corresponding to the divisor [P], then using the Kodaira vanishing theorem, there 
exists an integer 4 > O such that 


dim H°(X, O( LZ" @&)) > 2 


where O(.2" @ &) is the set of holomorphic sections of the sheaf 2“ ® & which 
can be seen as sections in H°(X \ {P}, G(&)) having a pole at P of order at most 
jt. Thus we have two linearly independent meromorphic sections of & with a single 
pole at P. When lifted to H U {cusps} these sections yield two linearly independent 
vector-valued automorphic functions (of weight 0) attached to (/’, o) with poles at 
the fiber of P. The full details of the proof can found for a higher dimension of the 
representation in [14]. 
We have therefore established the following: 


Theorem 5.4 For every discrete group I” and every 2-dimensional representation 
p of I’, vector-valued I’-automorphic functions of multiplier p exist and so do 
p-equivariant functions for I. 
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Abstract We discuss recent work of the authors in which we study the translation 
of classical hypergeometric transformation and evaluation formulas to the finite field 
setting. 

Our approach is motivated by the desire for both an algorithmic type approach 
that closely parallels the classical case, and an approach that aligns with geometry. 
In light of these objectives, we focus on period functions in our construction which 
makes point counting on the corresponding varieties as straightforward as possible. 
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finite field hypergeometric functions. 
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1 Motivation 


In this talk we discuss recent work of the authors [9], in which we study the 
translation of classical hypergeometric transformation and evaluation formulas 
to the finite field setting. The theory of classical hypergeometric functions and 
hypergeometric functions over finite fields sits inside the broader framework of 
hypergeometric motives. Hypergeometric functions over finite fields have been 
developed by several people, including for example Evans [5, 6], Greene [10], 
Katz [11], and McCarthy [12], and a number of current developments have been 
discussed at this workshop including recent work of Roberts et al. [13], Doran et al. 
[4], and Beukers et al. [2], for example. 

Our approach to translation of classical hypergeometric transformation and 
evaluation formulas to the finite field setting is motivated by two strong desires: 


1. An algorithmic type approach that closely parallels the classical case (and does 
not require particular ingenuity for each example). 

2. An approach that aligns with geometry by explicitly interpreting the finite field 
hypergeometric functions in terms of Galois representations corresponding to 
associated algebraic varieties. 


In light of these objectives, we focus on period functions in our construction which 
makes point counting on the corresponding varieties as straightforward as possible. 

We are also motivated by previous work joint with Deines, Fuselier, Long, and Tu 
[3] in which we study generalized Legendre curves yt =x'(l—x)/(- ax)‘, using 
periods to determine for certain N a condition for when the endomorphism algebra 
of the primitive part of the associated Jacobian variety contains a quaternion algebra 
over Q. In most cases this involves computing Galois representations attached to the 
Jacobian varieties using Greene’s finite field hypergeometric functions. 


2 Method 


From this perspective, the following approach is very natural. We slightly modify 
the finite field hypergeometric function definition of Greene (or McCarthy) by 
inductively using the Euler integral representation to construct our analogues. In 
the classical setting, define the period function ; Pola; z] := (1—z)~* = 1 Fola; zl, 
and then use the Euler integral formula to define 


a 
Piao = 1 — HP! 1 Pola; ztldt = B(b, c — b)2Fila, b; ¢; zl, 
0 


where B(x, y) = i t*—!(1 — t)9~'dt is the beta function. To justify calling these 
period functions we note that Wolfart [15] realized that if the parameters a, b,c € Q, 
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anda,b,a—c,b-—c ¢ Z, then the integrals 


2P\ I“ : ; | and (—1)°-# 8-14 !-¢ 5 Py ' TES Gl te re | 


are both periods of a generalized Legendre curve y" = x/(1—x)/(1—Ax)*, where 
N = lcd(a, b, c) (east common denominator), i = N-(1—b), j =N-(1+b-c), 
andk = N-a. 

Inductively we define higher period functions ,41 P,, gathering additional beta 
function terms in front of the ,+; F,. We can then use the following “dictionary” 
which is well-known to experts to translate these period functions to the finite field 
setting. Here g = p® for prime p, and Fj denotes the group of multiplicative 
characters on Re , where each character A is extended to Fy by defining A(0) = 0. 


Furthermore N € N, a,b € Q with common denominator N, nv € Fp has order 
N, A denotes the complex conjugate of A, and ¢ p is a primitive pth root of unity. 


a=4t,b=40 A,BeFX, A=ni, B=n), 
ce A(x) 
=. 2. = 
1 
fo & © oe ; 
Pa) <> g(A) = rer ae fe pxP 


B(a, b) <— J(A,B)= Lxek, A(x)BU — x) 
Thus we correspondingly define ;Po[A; A] := A(1—A), 


2Pi[A, B; C; A] := >) BQ) BC(1 — y)iPolA; Ay], 
yeF, 


and define »+1P, inductively. We obtain a nice point counting formula for the 
hypergeometric variety X, : yN = xj---x( — x))/t---( - x)" — 
dee =X)" In particular, we have for g = 1 (mod N), 


m=1 


N-1 ny niin mit 
#X,(Fq) = 1+q" + z m1Pn | O™ inti yen r|. 
N see In 


Normalizing the period functions ,+1P, by dividing by the appropriate Jacobi 
sums using the dictionary, gives our finite field analogues to the classical hypergeo- 
metric functions which we denote by ,+)F;,. For example when n = 1, 


1 
F,[A, B; C; A] := ———— 2P)[A, B; C; A]. 
2F i[ ] 7(B.CB)” il ] 
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If none of the “top” parameters are the trivial character, or match with one of the 
“bottom” parameters, we call the y4;P, or n41F, primitive. Our definition of 41 Fy 
has two nice properties that match the classical case: it is 1 when evaluated at 0, and 
in the primitive case it is symmetric in both the top or bottom parameters. 

We note that key properties such as the reflection and multiplication formulas for 
the Gamma function translate using the dictionary to properties of the Gauss sum. 
Our method allows that any classical formula proved using these properties, as well 
as their corollaries such as the Pfaff-Saalchiitz formula, can be translated directly 
(introducing error terms as needed) and thus we can indeed use an algorithmic type 
approach to translation that closely parallels proofs in the classical case. However, 
this approach does not work for everything in the classical setting; for example 
proofs involving a derivative structure cannot be translated in this way. 


3 Galois Interpretation 


We can interpret the ,4;P, or ni1F, functions as traces of Galois representations 
at Frobenius elements via the corresponding hypergeometric algebraic varieties. In 
the n = | case we make this explicit in the following theorem. 

For a given number field K, denote its ring of integers by Cx, its algebraic 
closure by K, and set Gx := Gal(K/K). We call a prime ideal p of Gx unramified 
if it is coprime to the discriminant of K. Fix 4 € Q. Given a rational number of the 
form -, a number field K containing Q(¢, 4) and a prime ideal p of K coprime to 
the discriminant of K, one can assign a multiplicative character tp (4) to the residue 
field Ox /p of p with size g(p) = |@x /p|. This assignment, based on the mth power 
residue symbol, is compatible with the Galois perspective when p varies. It is also 
compatible with field extensions of K. Thus the finite field analogues of classical 
period (or hypergeometric) functions are viewed as the converted functions over the 
finite residue fields, unless otherwise specified. We show the following. 


Theorem 1 Leta, b,c € Q with least common denominator N such that a, b, a—c, 
b—c ¢ Zandi € Q\ {0,1}. Let K be the Galois closure of Q(A, ¢y) with the 
ring of integers Ox, and & any prime. Then there is a 2-dimensional representation 
on, Of GK = Gal(K /K) over Qe(En), depending on a,b,c, such that for each 
unramified prime ideal p of Ox for which X and 1 — 2 can be mapped to nonzero 
elements in the residue field, 0),¢ evaluated at the arithmetic Frobenius conjugacy 
class Froby at p is an algebraic integer (independent of the choice of ), satisfying 


Troy, ¢(FroBp) = —2P 1 ba p) 4, av)| 
tp (c) 


As a corollary to this theorem, given a primitive 7P, and a prime £ we can 
compute the L-function of the corresponding 2-dimensional Galois representation 
(€-adic) as a product over good primes of terms involving 2P; and (2P)). 
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4 Examples of Translated Identities 


As examples of our techniques we use both our algorithmic type approach as well as 
the Galois perspective to translate several classical hypergeometric formulas to the 
finite field setting, including transformations of degree 1, 2, 3, algebraic identities, 
and evaluation formulas. 

For example, we translate a Clausen identity between the square of a 2F; and 
a 3F2. From a differential equations perspective, this identity is indicating that the 
symmetric square of the 2-dimensional solution space to the corresponding hyperge- 
ometric differential equation (HDE) corresponding to the 2 F is the 3-dimensional 
solution space of the HDE corresponding to the 3 F2. Translating this identity to the 
finite field setting yields a finite field hypergeometric transformation due to Greene 
and Evans [7] which in our notation more closely matches the classical identity. 
With the representation theoretic perspective, it indicates the fact that the tensor 
square of a 2-dimensional representation (associated to the 2F 1) is its symmetric 
square (which is a 3-dimensional representation associated to the 3F2) plus its 
alternating square (which is a linear representation). 

For an example of an algebraic type identity, consider the following identity in 
Slater [14, (1.5.20)] which gives that 


1-2 
aa—4 1+vI-z\) ~ 
2F\ : | = SFT : 
2a 2 


To see its finite field analogue, it is tempting to translate the right hand side 


into a corresponding character evaluated at oa using the dictionary. However, 
Theorem | implies that one character is insufficient as the corresponding Galois 
representations should be 2-dimensional. Instead, our translated identity becomes 
the following. For F, of odd characteristic, @ the quadratic character, A € F; 
having order at least 3, and z € Fy, 


i en :]J-(480=9) (*(HE) (=) 


The proof is quite straightforward using only translations of Kummer 24 relations 
as well as the reflection and duplication formulas. This example highlights that the 
Galois perspective allows us to predict analogues beyond the dictionary alone. 

As another example, we use our dictionary technique to translate a quadratic 9 F; 
transformation of Kummer [1, Thm. 3.1.1] which we first show can be proved using 
only the multiplication and reflection formulas with the Pfaff-Saalschiitz identity. 
The finite field version we obtain is equivalent to a quadratic formula of Greene in 
[10], but holds for all values in F,. Our proof (although it might appear technical 
on the surface) is very straightforward. In comparison, the approaches of Evans 
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and Greene to higher order transformation formulas (such as [8, 10]) often involve 
clever changes of variables. This example also demonstrates that our method has the 
capacity to produce finite field analogues that are satisfied by all values in Fg. 

As an explicit application of finite field formulas in computing the arithmetic 
invariants of hypergeometric varieties, we use the finite field quadratic transfor- 
mation from the previous example to obtain the decomposition of a generically 
4-dimensional abelian variety arising naturally from the generalized Legendre curve 
yl? = x9 (1 —x)9(1 — Ax). 


Acknowledgements Many thanks to the International Mathematical Research Institute MATRIX 
in Australia for hosting the workshop on Hypergeometric Motives and Calabi—Yau Differential 
Equations where this talk was presented. 
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Threefolds 


Ling Long, Fang-Ting Tu, Noriko Yui, and Wadim Zudilin 


Abstract In this project, we establish the supercongruences for the 14 families 
of rigid hypergeometric Calabi—Yau threefolds conjectured by Roriguez-Villegas 
in 2003. 


1 Main Result 


The talk outlines the proof of the supercongruences for the 14 families of rigid 
hypergeometric Calabi—Yau threefolds conjectured by Roriguez- Villegas [9]. 


Theorem 1 Let d, dz € {1/2, 1/3, 1/4, 1/6} or 
(d1, dz) = (1/5, 2/5), (1/8, 3/8), (1/10, 3/10), (1/12, 5/12). 
Then for each prime p > 5, we have 


dj 1—d, dy 1—d 
vn” 1 i 1 *; 1 = dp(fayd,) mod p’, 
p-1 


where the hypergeometric series on the left-hand side is truncated after p — 1 
terms and ap(fa,,d,) is the pth coefficient of an explicit Hecke eigenform fay dy 
of weight 4 associated to the corresponding rigid Calabi-Yau manifold via the 
modularity theorem. 
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2 Motivation 


The term supercongruence refers to a congruence which is stronger than what the 
formal group law implies. In [3] Beukers proved 


p-l Jop lop ptt pel 
a(25*) = am] 2 ; 2; faa mod p, 


where A(n) are the Apéry numbers 


n 2 2 
n n+k —n—-nn+in+1 
A = = 4F: Hal | 
@ > (i) *r') | it 4 


k=0 


and ap(f) is the pth coefficient of the Hecke eigenform nt (2t)n*(4r). In [3] 
Beukers also conjectured the supercongruence 


as 1 =a,(f) mod p’. 


This was proved by Ahlgren and Ono [1]. Their key idea is using Green’s 
hypergeometric function over finite fields to perform point counting on the Calabi— 
Yau threefold 


1 1 1 1 
SPP Horo twa HOT, 
x y % w 
which is modular. Later, Kilbourn [7] gives an extension of the supercongruence 
se, 2j\" liid 
= 2-8) = 4} 227272; 1 dp’, 1 
ap(f) a) AI 222 \., aoaly (1) 
jz = 


which was conjectured by Van Hamme. Kilbourn’s proof is mainly relying on p- 
adic tools. Using the techniques similar to the ones given by Ahlgren, Ono and 
Kilbourn, McCarthy [8] obtained the supercongruence 


1234 


where /1/5,2/5 is an explicit Hecke eigenform conjectured by Rodriguez- Villegas. 
This supercongruence corresponds to the mirror quintic threefold in P*, whose 
modularity was first established by Schoen [10]. The supercongruences given by 
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Kilbourn (1) and McCarthy (2) are particular instances of Rodriguez-Villegas’s 
conjectures. 

In this joint project, our main motivation is to study the arithmetic aspect 
of rigid hypergeometric type Calabi-Yau manifolds. The first step is verifying 
the supercongruences conjectured by Rodriguez-Villegas coming from the well- 
known 14 hypergeometric families of Calabi-Yau threefolds whose Picard—Fuchs 
equations are degree 4 hypergeometric differential equations with solution near 0 of 
the form 


pfu bod da ba, 
443 1 1 1 ae aa 


where d1, dz are as in Theorem |. When z = 1, it corresponds to the singularity 
of the hypergeometric differential equation, which is equivalent to getting a rigid 
Calabi—Yau threefold in the fibre. Due to Gouvéa and Yui [6], a rigid Calabi-Yau 
threefold defined over Q is modular. This means, the L-function associated with the 
third étale cohomology group of a rigid Calabi—Yau threefold V in the 14 families 
is equal to the L-function of an explicit Hecke eigenform of weight 4 conjectured 
by Rodriguez-Villegas. 

Very recently, Fuselier and McCarthy [5] establish the case (d},d2) = 
(1/2, 1/4). In this joint project, we provide a more general method to verify 
the remaining 11 cases of supercongruences conjectured by Rodriguez- Villegas. 


3 Key Ideas and Example 


The strategy of our proof is to use hypergeometric motives over Q to describe 
the arithmetic background. There are different versions of hypergeometric motives 
such as given by Katz, Greene and McCarthy. However, for our purposes, the 
most connivent one is the general version given by Beukers, Cohen and Mellit 
in [4]. They modify Katz’s finite hypergeometric function H(a, B; A) so that 
their version works for all the primes p. They also give a recipe to realize toric 
models as hypergeometric motives arising from certain type hypergeometric data 
a = (a1, 02,...,aq4) and B = (fj, B2,..., Ba), where a;, B; € Q. For such case, 
one can express the number of rational points over finite fields on the given model 
in terms of H(a, B; A). 

For example, the multi-sets a = (1/3, 2/3, 1/3, 2/3) and B = (1, 1, 1, 1) give 
the hypergeometric type Calabi—Yau threefold with dj = dz = 1/3. The toric model 
in this case corresponds to the resolution of singularities on the affine variety given 
by projective equations 


Wi: xp tx2+234+24 = yityty3ty4 = 0, (xry)? = 3° x2x3x4y2y3 94. Xi, V7 FO. 
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The resulting manifold W is the rigid Calabi-Yau threefold labelled as V3,3 by 
Batyrev and van Straten in [2]. 


In the talk, principal ideas of our proof are illustrated in the case d) = dz = 1/3. 
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Fernando Rodriguez Villegas 


Abstract We study classical hypergeometric series as a p-adic function of its 
parameters inspired by a problem in the Monthly solved by D. Zagier. 


1 Introduction 


The classical generalized hypergeometric series is defined by 


Q,... (a1) o> (Or )k y 
Fy | | =f oo a (1) 
PEN BY ie Brat 2 (Bik Br—-Dk E! 
fora; € Cand f; € C \ {0, —1, —2,...} . If a, = —n for n a non-negative integer 
the series terminates and we have 
n 
Leet n\ (1 )k +++ (Or—1)k 
pe B ‘ i] = dev" ) Gove t, (2) 
Bi... Br-1 = k} (Bik +++ (Bre 


One can show that for fixed a; € Zp, Bi € Zp \ {0,—-1, —-2,...} and |t|p <1 
this yields a convergent Mahler series and hence a continuous function f of the 
variable x := n in Zp 


=", oe yp... —Xx |- = (i) (a1) ++ (p—1)k k : 
-_ Lh se Bit le k) Boe Bk — 


k>0 


These functions seem very interesting and worthy of further investigation. 
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2 The Problem 


It turns out that a special case of these functions appears in the solution of an 
interesting Monthly problem [3] solved by D. Zagier. The problem is to prove that 


(EC) -=(C)): r 


where vp denotes the p-adic valuation. Zagier does this by showing that there is a 
continuous function f| : Z3 —>+ —1-+3Z3 which interpolates the values 


i=0 (x) 
fil) = am n=1,2,.... (5) 


Considering the expansion 


fim) = A+ Bn+Cn’? +... 


he goes further and conjectures, based on numerical evidence, that B = 0; moreover, 
he mentions 


Another interesting problem would be to evaluate in closed form the 3-adic number A. 


We prove that in fact 
3 
A= = 58302) = 992297 4 9 9° 4.3) 4 9 3 4 OG), 16) 


where ¢3(s) is the Kubota-Leopoldt 3-adic zeta function. 


3 Periods 


The connection with zeta values is perhaps to be expected: in general the Taylor 
coefficients of the functions of Sect.1 involve multiple polylogarithms. In the 
specific case in question we have 


a | 
f@m=>> on (;) (-3)',  f@) =nfi(n). (7) 
k=1 (x) 


If we expand in general 


1 
f(x) = > ale = So bn(t)x", 


k>1 \k n=0 
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then 


(ai 


(¢+4) Gi + a)G2 + 5)-°° Gn $5). 


OS ji<ja<-<Jn 


by(t) = (8) 


These multiple polylogarithms can be expressed in terms of usual polylogarithms 
for small n. Trivially bp = 0. For n = 1 we have the following identity of power 
series inz = 1—w 


bi((w — w7')?) = (w?-w*) | log (w*), = w= 1-2. (9) 


By plugging in a primitive third root of unity ¢3 € C3 for w it follows that 3-adically 
we have bj (—3) = 0. This shows that in this case f (x) is divisible by x and we may 
consider f)(x) := f(x)/x (see [3]). 

With some effort one can prove that as power series in z, with w = 1 — z, we 
have 


bo(w — w !)?) = (w? — w~*)| [Lin — w*) — ZLin(1 — w*) 


(10) 
—Lig(d — w7?) + 3Lind — w™4)), 


where Lip is the standard dilogarithm function. 

Plugging in w = ¢3 € C3 into (10) and using a result of Coleman [1] we 
obtain (6). The identity is the special case p = 3,r = 1 of the following. Given 
a prime p > 2 fix ¢, € Cp a primitive p-th root of unity. 


Theorem 1 
i) The following limit exists 


p-l 


1 2k s 
Alp) = sin, iy pas » (Geter at (11) 
ps 


k=0 
ii) Let: ie > C, be the Teichmiiller character. For0 <r < p — | we have 


1 = =r rei —~2j i r-1 
WO ire Ll (5 —b,° JAC) = Lp2, 0), (12) 


where L, is Kubota-Leopoldt’s p-adic L-function. 


We note in passing that 


; 2p°\ _ Pp(2p*) 
sim, ( ) ae 2q] Ty (p*) 


k>1 
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(see [2, §6.3.4, ex. 16]), where I", denotes the p-adic gamma function. 

The beauty of the expressions (9) and (10) is that though their proof were 
obtained working over the complex numbers they are identities of power series 
with rational coefficients and hence also hold p-adically in an appropriate domain. 
Fortunately, this domain includes the point were need to evaluate for Zagier’s 
questions (w = ¢3 € C3). 

For n = 3 there is an expression for b3(t) in terms of polylogarithms valid over 
the complex numbers, which is much more difficult to obtain. For n > 3 we do not 
expect b,(t) to reduce to polylogarithms. 

However, to apply this expression for b3 to our p-adic setting requires some 
form of analytic continuation. This we will achieve by delicate manipulations using 
Coleman’s integration but the details have not yet been fully carried out. 

The expectation nevertheless is that for p = 3 we should have that b3(—3) is a 
simple multiple of L3(3, x_3). But L3(s, x_3) is identically zero since x_3 is odd! 
Hence the constant B of Zagier should vanish because it is a special value of an 
L-function which happens to be identically zero. 


4 Speculation 


We tested numerically to see if there are any other relations for b,(t) and p-adic 
L-values and found only the following likely identities: 


_ | b4(—3) = — 24) _ | b2(-5) = 0 


Q3: 
** |b6(—3) = —28203(6) "| bs(—5) = — 280503) 


(13) 


but we did not attempt to prove these. We pointed out above that b4(t) and be(t) are 
not expected to be expressible in terms of polylogarithms. Hence the connection of 
the observed identities for b4(—3) and bg(—3) in Q3 appear to be less obvious than 
the others. 
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Abstract Fora multivariate Laurent polynomial f(x) with coefficients in a ring R 
we construct a sequence of matrices with entries in R whose reductions modulo 
Pp give iterates of the Hasse—Witt operation for the hypersurface of zeroes of 
the reduction of f(x) modulo p. We show that our matrices satisfy a system of 
congruences modulo powers of p. If the Hasse—Witt operation is invertible these 
congruences yield p-adic limit formulas, which conjecturally describe the Gauss— 
Manin connection and the Frobenius operator on the slope 0 part of a crystal 
attached to f(x). We also apply our results on congruences to integrality of formal 
group laws of Artin—Mazur kind. 


1 Hasse—Witt Matrix 


Let X/Fg be a smooth projective variety of dimension n over a finite field with 
q = p“ elements. The congruence formula due to Katz (see [1]) states that modulo 
p the zeta function of X is described as 


n + 
Z(Xj/E iT) = [ [det — 7 F481 HE(X, Ox)" mod p, (1) 
i=0 


where H'(X, Gx) is the cohomology of X with coefficients in the structure sheaf 
Ox and ¥ is the Frobenius map, the p-linear vector space map induced by h 
hP on the structure sheaf (p-linear means F (bs + ct) = b?¥(s) + c? A(t) for 
b,c € Fg ands,t € H'(X, Gx)). When X is a complete intersection the only 
interesting term in formula (1) is given by H"(X, Gx). The action of F on this 
space is classically known as the Hasse—Witt operation. 
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The following algorithm (see [2, §7.10], [1, Corollary 6.1.13] or [3, §1.1]) can 
be used to compute the Hasse—Witt matrix of a hypersurface X C P”*! given by a 
homogeneous equation f (x0, ...,%n+1) = 0 of degree d > n + 2. One extends the 
Frobenius to a transformation of the exact sequence of sheaves on P”*!: 

0 > Opnti(—d) =i Opn+1 > Ox > 0 
Oe ee LF LF 
0 Pei Pia Pye, 
The coboundary in the resulting long exact cohomology sequence allows to identify 


~ 


H'(X, 6x) > A! Gei(-d)), 


so that the Frobenius ¥ on H"(X, Gx) corresponds to the map on Htlepetl, 
On+i (—d)) induced by 


QQ “p—1 
0 Omid) > Gmir(—pdy > Bid) = 0; 


Computing Cech cohomology we find that Laurent monomials x—“ = 
—uo —Untl 


XQ -++%,4,. Where u runs through the set 
n+l 
U = {w= (ups -sss tania) > ap Ee Zs, Yui = 4} (2) 
i=0 


form a basis in H”+! (p"t! Opn+i(—d)) and the Hasse—Witt matrix is given in this 
basis by 


Fy cu = the coefficient of x?’ in f(x)?!. (3) 


Suppose one starts from a polynomial f in characteristic 0, e.g. with coefficients 
in Z. In my talk at the MATRIX institute in Creswick I presented a construction 
which lifts (3) to a matrix with entries in Z, whose characteristic polynomial 
conjecturally gives the p-adic unit root part of the zeta function attached to the 
middle cohomology of X. The proofs and a few evidences for the conjecture can be 
found in [4]. 
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We study a sequence of matrices which generalize (3). Let R be a commutative 
characteristic 0 ring, that is the natural map R — R ® Q is an embedding. 
Let f € Rie setts | be a Laurent polynomial in N variables. If f(x) = 
yo, aux" , ay € R, the Newton polytope A(f) C RN is the convex hull of the finite 
set {u : a, 4 0}. Consider the set of internal integral points J = A(f)? Nn ZN, 
where A(f)° denotes the topological interior of the Newton polytope. Let g = #J 
be the number of internal integral points in the Newton polytope, which we assume 
to be positive. Consider the following sequence of g x g matrices with entries in R 
whose rows and columns are indexed by the elements of J: 


(Bm)u.vey = the coefficient of xt)" in f(xy”. (4) 


By convention, fo is the identity matrix. We shall consider arithmetic properties of 
the sequence {8,3 m > O}. 

Let us fix a prime number p. We restrict our attention to the sub-sequence {a, = 
Bps—1; 5 = O}. The entries of these matrices are then given by 


(s)u.vey = the coefficient of x? in f(x)? —!. 


Notice that when R/pR is a finite field and f is a homogeneous polynomial of 
degree d such that its reduction modulo p defines a smooth hypersurface, then U 
in (2) coincides with J (with N = n+2) anda, = Bpy—1 modulo p is the Hasse—Witt 
matrix. 


Theorem 1 Assume that the ring R is endowed with a pth power Frobenius 
endomorphism, that is a ring endomorphism o : R — R satisfying o(a) = aP 
mod p for alla € R. Then for every s 
as = a-o(a1)-...-0° '(a}) mod p. (5) 
If a is invertible modulo p then for every s => | one has congruences 
Os41 (Os)! = as-o(as—1)~' mod p* (6) 
and 


D(as)-a,! = D(as—1)-a!, mod p* (7) 


for any derivation D: R > R. 


Congruence (5) shows that aw, mod p are iterates of the Hasse—Witt operation 
whenever the latter is defined. It also implies that when q is invertible modulo p 
then all a; are invertible modulo p and hence also modulo p* for all s. Therefore 


478 M. Vlasenko 


statements (6) and (7) make sense. We remark that analogous congruences also hold 
when one multiplies by the inverse matrices on the left, that is we can prove that 
a(as)7} Ass] = o(ds—1)~! + as and a. -D(as) = en - D(as—1) mod p*. 

Our results are related to the topic of the workshop because when A(f) is a 
reflexive polytope (in this case g = 1), the toric hypersurface of zeroes of f can 
be compactified to a Calabi-Yau variety. Congruence (6) then generalizes so called 
Dwork’s congruences (see [5, 6]) and (7) seems to be new even in the Calabi-Yau 
case. 

Theorem | implies existence of the p-adic limits 


F = limas41-o(a;)7! (8) 
S—>0Oo 
and 
Vp = lim Dias) - ay! for every derivation De Der(R). (9) 
SOO 


These are g x g matrices have entries in the p-adic closure R = lim R/p*R. 
<_ 


Note that F = a, mod p. We are currently working on identifying the limiting 
matrices (8) and (9) with the Frobenius and Gauss—Manin connection on the slope 0 
part of a crystal attached to the Laurent polynomial f. This fact was conjectured 
in [4] based on several examples and analogy with the congruences for expansion 
coefficients of differential forms stated in [7]. The progress in this project is due 
to our collaboration with Frits Beukers, which started at the MATRIX institute. 
I am also grateful to Frits for the series of extremely helpful lectures on Dwork 
cohomology which he gave during the first week of the program. 

Matrices (4) showed up in [8] as coefficients of the logarithms of explicit 
coordinalizations of the Artin—Mazur formal group laws of projective hypersurfaces 
and complete intersections. Under certain conditions (e.g. R is the ring of integers 
of the unramified extension of Q, of degree a and f is a homogeneous polynomial 
whose reduction modulo p defines a non-singular hypersurface X/F,«) one can 
combine (6) with the generalized Atkin and Swinnerton-Dyer congruences in [9], 
which yields that the eigenvalues of 6 = F -o(F)....- o*—!(F) are p-adic unit 
eigenvalues of the Frobenius operator on the middle crystalline cohomology of X 
(see [4, Section 5]). 

Our second result is the following integrality theorem for formal group laws 
attached to a Laurent polynomial. Its proof is based on explicit congruences (similar 
to those in Theorem |) and Hazewinkel’s functional equation lemma (see [4, Section 


4]). 


Theorem 2 Let J be either the set A(f) ZN of all integral points in the Newton 
polytope of f or the subset of internal integral points A(f)° A ZN. Assume that J 
is non-empty and let g = #J. Consider the sequence of matrices Bm € Matgxg(R), 
m>0O given by formula (4) and define a g-tuple of formal powers series 
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l(t) = (u(t) )ues in g variables tT = (Ty) ves as 
sea | 
it) = y = Bit". 
m= 


Consider the g-dimensional formal group law G f(t, t’) = 17! (L(t) + L(t’) with 
coefficients in R @ Q. 

Let p be a prime number. If R can be endowed with a pth power Frobenius 
endomorphism then Gf is p-integral, that is Gf € Rvpyllt, t’]] where Rip) = 
R ® Zip) is the subring of R ® Q formed by elements without p in the denominator. 


Note that if one can define a Frobenius endomorphism on R for every prime p 
then Theorem 2 implies that G ¢ € R[[t, t’]] because the subring NpRi—p) C R@Q 
coincides with R. For example, rings Z and Z[t] are of this type: one can take the 
Frobenius endomorphism to be the identity on Z and h(t) A(t?) on Z[t]. 
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Triangular Modular Curves ®) 


Check for 
updates 


John Voight 


Abstract We consider certain generalizations of modular curves arising from 
congruence subgroups of triangle groups. 


1 Triangle Groups 


Let a,b,c € Zs2 U {oo} satisfy a < b < c. Consider the triangle T with angles 
a/a,m/b, a/c (with 1/oo = 0) in the space H, where H is the sphere, Euclidean 
plane, or hyperbolic plane according as the quantity x (a, b,c) = 1/a+1/b+1/c—1 
is positive, zero, or negative. Let T,, tT), Te be reflections in the sides of T and let 
A = A(a, b,c) be the subgroup of orientation-preserving isometries in the group 
generated by the reflections: then A is generated by 


ba =Thtce, S85 =TeTa, Se = Tath 


and has a presentation 


A = (8a, 5p, 5¢ | 64 = 4 = 66 = b95p5¢ = 1). 
We call A a triangle group. The quotient 
X = X(a,b,c; 1) = A(a,b,c)\H 


is a complex Riemannian 1-orbifold of genus zero; it has as many punctures as 
occurrences of oo among a, b, c. 


Example I We have A(2, 3,3) ~ Ag, and the other spherical triangle groups (i.e., 
those with x (a, b, c) > 0) correspond to the Platonic solids. The Euclidean triangle 
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groups are the familiar tessellation of the plane by triangles. We have A(2, 3, 00) ~ 
PSL2(Z); and A(oo, oo, 00) ~ I(2), the free abelian group on two generators. 


A uniformizer for X is expressed by an explicit ratio of 2 F\-hypergeometric 
functions, with parameters given in terms of a, b, c. As a consequence, containments 
of triangle groups imply relations between 7 F\-hypergeometric functions, with 
arguments given by Belyi maps. Moreover, the quotient is a moduli space for certain 
abelian varieties, often called hypergeometric abelian varieties: the values of the 
hypergeometric functions are periods of the generalized Legendre curve 


a ag See) 


for certain integers A, B, C, N again given explicitly in terms of a, b,c. 

The triangle group A is arithmetic if and only if it is commensurable with the 
units of reduced norm | in an order in a quaternion algebra over a number field 
(necessarily defined over a totally real field and ramified at all but one real place). 
There are only 85 arithmetic triangle groups, the list given by Takeuchi [4]; for these 
groups, the corresponding curve X is a Shimura curve. 


2 Triangular Modular Curves 


For the remaining nonarithmetic triangle groups, there is still a quaternion algebra! 
This observation was used by Cohen—Wolfart [2] in their work on transcendence 
of values of hypergeometric functions. This relationship can be interpreted geomet- 
rically: there is a finite map X — V where V is a quaternionic Shimura variety, 
a moduli space for abelian varieties with quaternionic multiplication, suitably 
interpreted. The dimension adim(a, b, c) of V is given in terms of a, b, c; we call it 
the arithmetic dimension of (a, b, c). Nugent—Voight [3] have proven that for every 
t, the set {(a, b,c) : adim(a, b,c) = t} is finite and effectively computable. For 
example, there are 148 + 16 = 164 triples with arithmetic dimension 2. 

Like with the modular curves, we now add level structure: we take a congruence 
subgroup 1°(§8) < I” of the uniformizing group I" for V, and we intersect 


A(p) = PCB) NA. 


By pullback, this gives a cover 
&: X(p) = A(p)\H > X(1); 


this corresponds geometrically to adding level structure to the family of hyper- 
geometric abelian varieties. Clark—Voight [1] have proven that the cover ¢@ has 
Galois group PSL2(Fy) or PGL2 (Fy) (cases distinguished by a Legendre symbol); 
moreover, the minimal field of definition of ¢ is explicitly given as an at most 
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quadratic extension of an explicitly given totally real abelian number field with 
controlled ramification. 

We call these curves X (p) triangular modular curves as generalizations of the 
classical modular curves, and we expect that their study will be as richly rewarding 
for arithmetic geometers as the classical case. 
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Jacobi Sums and Hecke ® 
Grdssencharacters ote 


Mark Watkins 


Abstract We give an extended abstract regarding our talk, and the associated 
Magma implementation of Jacobi sums and Hecke Groéssencharacters. This builds 
upon seminal work of Weil (Trans Am Math Soc 73:487—495, 1952), and makes his 
construction explicitly computable, inherently relying on his upper bound for the 
conductor. Moreover, we can go slightly further than Weil by additionally allowing 
Kummer twists of the Jacobi sums. We also note the correspondence of these 
(twisted) Jacobi sums to tame prime information for hypergeometric motives. 

Although our viewpoint and notation is derived from later work of Anderson, we 
do not use his formalism in any substantial way, and indeed the main thrust of all 
we do is already in Weil’s work. 


Let? = >> ji (xj) € Z[Q/Z|° be an integral linear combination of nonzero 
elements x; € Q/Z as a formal sum, with }° jMjxj = 0. We put m for the least 
common multiple of the denominators of the x;, and write Kg © Q(ém) for the 
subfield corresponding by Galois theory to modding out (Z/mZ)* by those u for 
which the scaling uod = > j ij (uxj) is equal to 6. Letting a be a nontrivial additive 
character modulo p and recalling the Gauss sum of a multiplicative character w on 


F;, as 
Galt) =— D> W(x) a(Trx), 
xeF, 


for ideals p of Q(¢m) we define the Jacobi sum 


Jo(p) = |] Galxp 
J 
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where this is independent of the choice of a and x» is the power residue symbol 
—(*) = ,@-D/m 
Xp(x) =(-) =x (mod p). 
p/m 


where q is the norm of p. One then has a partial L-function 


J =] 
Li(s) = H(: = 0) 


S 
p q 


where the product is over p { m in Kg. 

In a 1952 paper [3], Weil associates a Grdssencharacter to such a Jacobi sum 
L-function, and in particular gets an upper bound on the modulus. This gives 
us an algorithm in principle to compute said Grdssencharacter, which has been 
implemented in the Magma computer algebra system [1, 2]. Briefly, one first 
determines the field of definition Kg of the Gréssencharacter as above, and then 
the oo-type in a similar manner. The upper bound on the modulus then makes it a 
finite problem to recognize the correct twist in the Hecke character group (the dual 
of the ray class group), and by computing Jg (p) at sufficiently many primes of small 
norm we can isolate the desired twist. The possibility of including Kummer twists of 
the 9 was not considered directly by Weil, but fits easily into the above framework. 

The resulting Jacobi sum machinery also helps explain the tame prime behavior 
of hypergeometric motives, in particular giving the Euler factors when the inertia 
corresponding to such primes is in fact trivialized. As an example, for the quintic 
3-fold at (say) t = to - p> with p = 1 (mod 5), the Euler factor corresponds to a 
Groéssencharacter over Q(¢5), with the precise twist varying with fo. 

This is joint work with David Roberts and Fernando Rodriguez Villegas. 
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Special Values of Hypergeometric M®) 
Functions and Periods of CM Elliptic eed 
Curves 


Yifan Yang 


Abstract Let X = xe) / We be the quotient of the Shimura curve x6(1) by all 
the Atkin-Lehner involutions. By realizing modular forms on X in two ways, one 
in terms of hypergeometric functions and the other in terms of Borcherds forms, 
and using Schofer’s formula for values of Borcherds forms at CM-points, we obtain 
special values of certain hypergeometric functions in terms of periods of elliptic 
curves over Q with complex multiplication. 


Let ae (N) be the Shimura curve associated to an Eichler order of level N in 
an indefinite quaternion algebra of discriminant D over Q. When D = 1, the 
Shimura curve xt (N) is just the classical modular curve Xo(N) and there are 
many different constructions of modular forms on Xo(N) in literature, such as 
Eisenstein series, Dedekind eta functions, Poincare series, theta series, and etc. 
These explicit constructions provide practical tools for solving problems related to 
classical modular curves. On the other hand, when D + 1, because of the lack of 
cusps, most of the methods for classical modular curves cannot possibly be extended 
to the case of general Shimura curves. However, in recent years, there have been 
several methods for Shimura curve emerging in literature, such as the method of 
Yang [9] realizing modular forms on a Shimura curve of genus zero in terms of 
solutions of its Schwarzian differential equation, the method of Voight and Willis 
[7] for computing power series expansions of modular forms, the method of Nelson 
[4] for computing values of modular forms using explicit Shimizu lifting [8], and 
the method of Elkies [1] via K3 surfaces. Finally, there is a powerful method that 
realizes modular forms on Shimura curves as Borcherds forms. To make the method 
of Borcherds forms useful in practice, one would employ Schofer’s formula [5] for 
values of Borcherds forms at CM-points (see [2] for sample computation). In [3], 
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we developed a systematic method to construct Borcherds forms and determined the 
equations of all hyperelliptic Shimura curves x (N) using Schofer’s formula. 

In [11], by combining the method of Schwarzian differential equations and the 
method of Borcherds forms, we obtain some intriguing evaluations of hypergeomet- 
ric functions, such as 


15 3. 37.74 O-43 
F oT) 8 Vii 74./8 1 
(sea 24° 4" —sa-gs) = 3 Ped (1) 


and 


(2) 


Pia oer 100 9 
37237474’ 210.56) ~ 9 4? 


where for a negative fundamental discriminant d, we let 


py MET 7 g \xa@ua/4ha 
vidi Le G 


be the Chowla-Selberg period. Here xg is the Kronecker character associated to 
Q(Ja), {a is the number of roots of unity in Q(/d), and hq is the class number 
of Q(/d). Note that if E is an elliptic curve over Q with complex multiplication 
by Q(./d), then its periods are algebraic multiples of ./77wg. We now explain the 
origin of such evaluations. 

Assume that f(t) is a modular function on bee (N) that takes algebraic values 
at all CM-points. Then according to Shimura [6, Theorem 7.1] and Yoshida [12, 
Theorem 1.2 and (1.4) of Chapter 3], the value of t’(t) at a CM-point of discriminant 
d is an algebraic multiple of or. Here we choose f(t) to be the Hauptmodul 
of X = xP (1)/We, the quotient of xe) by all the Atkin-Lehner involutions, 
that takes values 0, 1, and oo at the CM-points of discriminants —4, —24, and 
—3, respectively. Now the Schwarzian differential equation of X is essentially a 
hypergeometric differential equation (see [9]), which means that all (meromorphic) 
modular forms on X can be expressed in terms of hypergeometric functions. In 
particular, we have 


an/4(1 — 2) 1/2 Lt) 7 1 SY 
rea a 2F\ —C2F\ ; 
Ci 24’ 24° 4 24’ 24° 4’ 


where C = —1/712w2 4 (see Lemma 8 of [10]). Manipulating this identity and 
recalling the result of Shimura and Yoshida above, we find that at a CM-point tg of 
discriminant d, we have 


i % ie ie 
F rice Be 
hi 5. 74° 4’ rita) e.g 


Wd = 
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and 


This explains the algebraicity of values of hypergeometric functions at singular 
moduli. To determine actual values, we use theory of Borcherds forms. 

In [9], we find that the one-dimensional space of modular forms of weight 8 on 
X is spanned by 


8 
Oy ee) ee a) ee 
2°) 94? 24? 4’ F120, et oa oa A 


On the other hand, we can construct a Borcherds form W of weight 8. As the space of 
modular forms has dimension 1, these two modular forms must be scalar multiples 
of each other. Evaluating W at the CM-point of discriminant —4 using Schofer’s 
formula, we can determine the ratio of the two modular forms. Then evaluating at 
other CM-points, we obtain the special values of hypergeometric functions. 
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CyY-Operators and L-Functions ®) 


Check for 
updates 


Duco van Straten 


Abstract This a write up of a talk given at the MATRIX conference at Creswick 
in 2017 (to be precise, on Friday, January 20, 2017). It reports on work in progress 
with P, CANDELAS and X. DE LA OSSA. The aim of that work is to determine, under 
certain conditions, the local Euler factors of the L-functions of the fibres of a family 
of varieties without recourse to the equations of the varieties in question, but solely 
from the associated Picard—Fuchs equation. 


1 Introduction 


It is very honourable to speak the last words in this nice conference; surely these 
words are not the last on hypergeometrics, but rather represents a further exploration 
into Transhypergeometria, the unknown land of our dreams. I will report on joint 
work in progress with CANDELAS and DE LA OSSA [9]. I will start with some 
motivation. 


2 Elliptic Curves Versus Rigid Calabi-Yau Threefolds 


Elliptic curves and rigid Calabi-Yau manifolds share many common features. As a 
topological space, an elliptic curve is isomorphic to S! x §! and a rigid Calabi-Yau 
threefold is a bit like S? x $%, at least what its third cohomology is concerned. 
On the arithmetic level, an elliptic curve E defined over Q determines a two 
dimensional motive H!(£) and in a similar way a rigid Calabi-Yau threefold X 
defined over Q produces a two dimensional motive H 3(X). There are Hodge and 
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p-adic realisations, giving rise to L-functions that come from classical modular 
forms for some JQ(N). 


Space | Motive| Hodge | Frobenius Weil Hecke 
E/Q|H'(E)|0 1 1 0) T?-a,T +p | |ap| < p'/? | L(A'(E)) = L(f), f € S)) 
X/Q | H°(E)|1 0 0 1) T? —apT + p?| lap| < p*? | L(H?(X)) = LO), f € Sao) 


By the great theorem of WILES [35, 36] we know that all elliptic curves over Q 
are modular, and by further development of these methods, it was shown that rigid 
Calabi—Yau threefolds defined over Q are also modular [14, 17]. 

However, there are also big differences between these two cases. Elliptic curves 
depend on a single modulus and form nice families. Classical normal forms are 
provided by the Legendre family 


Lyi y? =x(x—1)(x-A) 
or the Hesse family 
A: x +y4+2+axyz=0, 


where A is the parameter. 

On the other hand, as by definition h!* = 0, rigid Calabi-Yau spaces do not 
admit any non-trivial deformations, and their occurrence is sporadic. No general 
description or construction is known for them. We refer to [23, 37] for an overview 
of the exciting bestiary. 


Question 


Which weight four cups forms appear as modular form of rigid Calabi-Yau 
manifolds? 


For example, as can be seen from consulting [23], there are many different rigid 
Calabi—Yau varieties leading to the weight four cusp form for I(6), but I do not 
know of any rigid Calabi-Yau threefold realising the weight four cusp form for 
To(7). 


2.1 How Can Rigid Varieties Appear in a Pencil? 


Let us look at an example. The famous Schoen quintic X, studied in [29] is the 
degree 5 hypersurface in P* given by the equation 


X1: x? + x2 + 2 + x? + a2 = 5x1X2X3X4X5. 
It is easily seen to have the 125 points 


3 _ 
x= 1, X{XIXZX4ANS5 = 1 
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as nodal singularities. There exists a small resolution zm : X —> X, that replaces 
each node by a projective line P!. X is arigid Calabi-Yau threefold: the infinitesimal 
deformations of X can be identified with the infinitesimal deformations of X1 for 
which the nodes lift, which are none. For small prime numbers the Euler factors 
of the L-function can be determined counting points of X; and correcting these 
counts to get the numbers of points of the resolved manifold X. As the Galois 
representation is determined by finitely many Euler factors, it was found that 
the L(H3(X1)) = L(f) for some f € S4(10(25)), which was identified by C. 
SCHOEN. 


Now note that the quintic X; (and not X) is a member of the even more famous 
Dwork pencil 


Xy: x + ce + a + x + x2 = S5Wx1x2x3xX4X5 


that stands at the beginning of the mirror symmetry story, for which we refer to 
[8, 11, 24, 33]. The third cohomology of Xy is the direct sum of two pieces 


H(Xy=VOF. 


Here the part F has Hodge numbers 0 100 100 0, and the part V has Hodge 
numbers | 1 1. 1. The Picard—Fuchs equation for this part leads to the 
hypergeometric differential equation 


26! ses SOs SS 64), ee ' pai 
P=O-SU(O+ 58 + ZO + 3) + 2), t=1/6y), Ci 
which describes a variation of Hodge structures (VHS) over S := P!\{0, 1/5°, oo}. 
At the three singular points these Hodge structures degenerate into mixed Hodge 
structures (MHS) [30]. We refer to [25] for a detailed account of (mixed) Hodge 
theory. Quite generally, the Jordan structure of the local monodromy determines the 
weight filtration. At t = 0 we have a so-called MUM-point, the monodromy has a 
maximal Jordan block. The mixed Hodge diamond looks like 


(The weight is equal to the height in the diagram, counted by putting lowest row 
at height zero; the operator N shifts two steps downwards.) The limiting mixed 
Hodge structure is an iterated extension of Tate Hodge structures and it leads to 
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the extension data described in [13] that are equivalent to the so-called instanton 
numbers computed in [8]. 


Att = 1/5° there is a single Jordan block of size 2 (a C-point in the terminology 
of [31]). The mixed Hodge-diamond for H 3 looks like: 


So we see that the motive Gry H? is like that of a rigid Calabi—Yau. 


There is one further possible degeneration of a (1, 1, 1, 1)-WHS, that does not 
appear in this family, namely where there are two Jordan blocks of size 2 (a K-point 
in the terminology of [31]). The mixed Hodge diamond for H? now looks like 


So Gr H? is a (1, 0, 1)-Hodge structure that looks like the one appearing for 
K3-surfaces with Picard number 20. 

One of the motivations to look at general motivic (1, 1, 1, 1)-variations over 
S =P! \ D is the natural appearance of weight four and weight three cusp forms 
for Ig(N) at the boundary points &' Cc P!. Such motivic (1, 1, 1, 1)-variations are 
expected to arise from Calabi—-Yau operators. 


3 Calabi-Yau Operators 


Calabi—Yau operators, as understood in [3] and [31], are operators ‘like’ Y. First of 
all, they are fourth order Fuchian differential operators 


d 
PHeCt, 0], O=t— 
dt 
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that are symplectic and have 0 as a MUM-point. If we look at it from the point 
of view of differential operators, it is rather easy to satisfy these conditions, for 
example by looking at operators of the form 


P=O0°PC?+OQ0O0+R, 


where P, Q, R are any polynomials with P(0) = 1. In order to classify as a Calabi- 
Yau operator, one has to complement these easy conditions with further arithmetical 
conditions that are supposed to hold if the operator is a Picard—Fuchs operator of a 
1-parameter family of Calabi—Yau varieties defined over Q. In [3] the following 
integrality conditions were put forward and used to define Calabi—Yau operators. 


I. The holomorphic solution ¢o(t) has an integral power-series expansion: 


bo(t) € Z[[r]]. 


II. The g-coordinate has an integral power series expansion 
q(t) € Z{[r]]. 
Ill. The normalised instanton numbers become integral 
no:=1, ny, No,...,Nd,..- 


after multiplication by a common denominator. 


Furthermore, the case where all ng = 0, d > 1 is considered as trivial, as in that case 
P is the third symmetric power of a second order operator. In fact, it is more natural 
to have coefficients in Zin. so to allow denominators involving a finite set of bad 
primes. Currently more than 500 operators are known that seem to satisfy these three 
conditions (see [2, 4, 10]), but condition II is not proven to hold in a single case. The 
first condition should already imply that the operator is of geometric origin, see [5]. 
There are many examples of operators that satisfy 7, but not //. In a good number 
of cases integrality of the g-coordinate have been proven [12, 21]. For some time it 
was expected that condition //J was implied by J and //, until MICHAEL BOGNER 
[6] found an operator that satisfies 7 and J/, but for which //T appears to fail. There 
exists an unpublished paper [34] in which it is claimed that Picard—Fuchs operators 
coming from families of Calabi—Yau varieties indeed satisfy these three arithmetical 
conditions. 

Of course, one can also look at differential operators of order different from four, 
and try to single out a particular nice sub-class of Calabi—Yau operators of arbitrary 
order. For an account, we refer to [1, 6] and [7]. 

A particular nice example is operator AESZ 34 


64 — 1(35 04 + 7063 + 63 O72 +280 +5)+ 
+12(@ + 1)(259 O27 + 518 © + 235) — 5°#7(@ + 1)2(© 4 2)? 
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that was reported to us long ago by VERRILL [32]. It turned up prominently at this 
conference, as it is associated to the five-fold banana FEYNMAN graph. As such, 
it is part of a very nice series of Calabi—Yau operators that exist for all orders. Its 
Riemann symbol (see [18, 19]) is 


0 1/25 1/9 1 co 
0 001 


0 
0 
0 
0 


No RK 


111 
112 
2.2.2 
and the holomorphic solution has an expansion of the form 


oo , n! : 


n=0 i+j+k+l+m=n 


As for all Calabi-Yau operators, there is a unique Frobenius basis of solutions 
around 0 of the form 


dott) = fo(t) 
oi (t) = log(t)do(t) + fit) 
d2(t) = log(t)2$0(t) + 2 log(t)d1(t) + fo(t) 


63(t) = log(t) @o(t) + 3 log(t)?o1(t) + 3 log(t)d2(t) + f(t) 
where fo(t) € Z{[t]], fi(t) € tQ{[e]] (@ = 1, 2, 3). 


The points 1/25, 1/9, 1 are C-points: there appears a single logarithm ‘between’ 
the two equal exponents. The point oo is a K -point: there are two logarithms, again 
between the two pairs of equal exponents. At each of the conifold points should 
appear a weight four modular form of some level, at oo there is a weight three 
modular form. 


4 Euler Factors from Picard—Fuchs Operators 


It has been known from the work of DwoRK [15, 16] that there is a very tight 
link between the Frobenius operator and the Picard—Fuchs operator in a family 
of varieties. For the sake of concreteness, let us consider as before a family Y; of 
Calabi—Yau threefolds defined over Q with a MUM-point at 0 and let us fix a prime 
p. Then the Frobenius operator 


F := F, € Aut(H°(%;)) 
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has a characteristic polynomial P(7T) = det(T — F) of the form 
T* +aT? + bpT* + ap°T + p® €Z[T], 
where 
a=ap(t)=Tr(F), b=bp(t) = (Tr(F*) — Tr(F)*)/2p. 
from which we get the local Euler factor 
6—4s 


1 tun hon ay? 


for the L-function of H3(Y,). 


4.1 Unit Root Method 


Let us suppose that the Frobenius polynomial is irreducible, but factors over Zp as 
(T —u)(T — v)(T — p?/v)(T — p®/u) € Zp IT] 


with ordy(u) = 0,ord,(v) = 1. Then wu is called the unit-root and according 
to DwWoRK [16], this unit root uw = u(t) can be computed from the holomorphic 
solution ¢@o(t) using p-adic analytic continuation of 


polt) 
po(t?) 


and evaluation at Teichmiiller lift 7 of t € P! (avoiding singular and supersingular 
values of t.) Dwork’s unit-root method has been clarified by KATZ [20] by 
formulating it in terms of crystals. In her thesis, SAMOL [26] used this method 
to compute Euler factors for many families of Calabi-Yau varieties, using only 
the Picard—Fuchs equation. One of the important discoveries she made was that 
in many cases the method even worked at the singular points of the differential 
equation, and thus managed to determine weight four forms attached to C-points of 
Calabi—Yau operators [27]. The explicit control of the p-adic analytic continuation 
can sometimes be obtained from Dwork congruences on the coefficients A, of the 
holomorphic solution. In the context of Calabi-Yau varieties defined by Laurent 
polynomials such Dwork congruences can be shown to hold [22, 28]. 
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4.2 Deformation Method 


The type of crystals we are considering are defined over a ring R, which is a certain 
two-dimensional regular local sub-ring of Z,[[t]]. On R there are two operations: 
the derivation 


a 
O:R—R, ani 


and the lifted Frobenius map 
o:R—R, a(t)Pa(t?). 
One has 
Oocao=poo@O. 


We will consider a free R-module of rank four H, a non-degenerate symplectic 
pairing 


(-,-):HxH—>R 
and two operations 
V:H—H, F:H—-H 


that we call the Gauss-Manin and Frobenius. The operator V a connection, so is 
supposed to satisfy the appropriate Leibniz rule, whereas F is o-linear. These three 
structures are required to satisfy the following compatibilities 


(i) O(x, y) = (Vx, y) + (x, Vy). 
Gi) p(x, y) = (Fx, Fy). 
(iii) VF = p FV. 


Furthermore, we will have a Hodge-filtration 

Pip eli ere ht 2H 
with 

V(Fil') c Fil'"!,  F(Fil') c p'H. 


The first part of the structure may be called a polarised F-crystal, including 
the filtration makes us speak about a polarised divisible Hodge F-crystals 
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(Fontaine-Lafaille crystals), we will call it a CY-crystal for short. Let us try to 
associate such a structure to a differential operator of the form 


P= O* +tP\(O) +17 P(O)+...4+1'P,(O). 


For this, we write everything out in MATRIX-form. We let 


3 
H =) ° Rd, 
i=0 


where the ¢; are abstract basis vectors, that behave with respect to differentiation as 
the Frobenius basis of Y. Writing out the action of © on them, we can construct 
the companion matrix A(t) for the connection V on H corresponding to 7: 


yong A(t) 
~ dt , 
where A(t) is of the form 
000 x 
Age | O° | ay ene ae ee ee Oe, 
010x 
001 x 


Because of the MUM-condition, we have 


0000 
1000 
0100 
0010 


The matrix » of the symplectic form at t = 0 can be taken to be of the form 


We now write the Frobenius matrix in a series 


F=F()=Fot+tFit+ Pot? +... 
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The above conditions, especially the Griffiths transversality and divisibility, lead to 
a very specific form for the constant term Fo: 


BS 06) 1 OF 20) 
pa —ép 0 O 
p°’B pea Ep? 0 |’ 
p’y p°B pra Ep? 


where &* = 1 and €f = a” /2. One give an explicit formula for the series F(t) as 
= p)-l 4x4 
F(t) = E(t?) FoE(t) € Qty, 


where the matrix E(t) is a modification of the fundamental matrix for the differential 
equation 


bo O(go) O*(o0) ©" (o) 
EF, =0'd, = | 2! OG) OD Or) loo 114%4. 
k= OOI= | 4, @(bx) O2(b2) O3(Gn) | © VtNloe 4 


$3 O(¢3) 07(¢3) 07(43) 


This matrix reduces mod t¢ to EX := O* log/(t)/j! and we set 


E:= (BE) 'E=“E é 
log(t)=0 


In all examples we have computed so far, we could make the following 
Observations 


e All terms of the series F(t) are p-adically integral (depending linearly on 


a, B,y.) 


¢ One can write 


p(t) 


FO= F@rt 


mod p F 


where g(t) € (Z/ pytyr4 is a polynomial matrix and A(t) is the discriminant 
of the operator Y. 

¢ The poles cancel at all singularities of Y, except for the apparent singularities. 
So if Y does not have apparent singularities, the matrix F(t) mod _p? is in fact 
polynomial. 

e We can ‘trivially’ read off 


a(t)=—TrF(t) mod p>, b(t) = (Tr(F(t)”) — Tr(F(t))*)/2p mod p? 
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and these do not depend on the choice of a, 6, y (this was already observed in 
[26].) This suffices to determine the local Euler factor at p for p > 5. 


Using this, we can compute Euler factors even at the singular points, as long as 
they are not apparent singularities. In particular, it works at the conifold points and 
we do not have to care about super-singular behaviour. For example, for the above 
mentioned operator AESZ 34 one finds characteristic polynomials of Frobenius of 
the form 


T(T — px(p))(T* —apT + p’) 
for some character x. We find 


1/25} 1/9 1 
a7| 32 | —16| —16 
aj,;}—60|) 12 12 
aj3|—34| 38 | 38 
ai7\ 42 |—126)—126 


So we recognise, using the table in [23], the weight four cusp forms 6/1 for Io(6) 
att = 1 andt = 1/9, and the form 30/1 for 1(30) att = 1/25. 


4.3 Lifting to Higher Order 


Let us seta = 6 = Oand é = 1, but keep y as a parameter. It appears that there is 
a unique choice for y mod p for which 


PO mip) 4 
F(t) = Ane mod p 
where £(p) is a small slope linear functions of p and @(t)mcp) is a matrix- 
polynomial of small degree m(p) linear in p. For all other choices of y this 
structure seems to get lost. By playing the same game modulo p°, p®°, p’, etc, we 
can determine a number y modulo p*, p*, etc. Continuing this way, we obtain a 
well-defined p-adic number y that goes into the Frobenius matrix at the MUM- 
point: 


Fo = 
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For the quintic and p = 11 one finds 
wet es posi 71 441145210 46.11 4... 
Recall the relation between the p-adic ¢(3) and the p-adic gamma function: 
—2¢p(3) = log Fy"(0) = Fy’(0) — F,(0)°. 


The following marvellous miracle seems to take place: 


Observation 


* y=r-Cp(3). 
* r=c3(X)/d, where d is the degree of the mirror manifold. 


For the quintic r = 200/5 = 40. This is reminiscent of a very similar matrix 
describing the hermitian form (x, y), where = is the Frobenius at oo, that is, complex 
conjugation, and the real ¢ (3) appears at the place of ¢, (3)! 


This is the end of the talk and of the conference, but I feel it is the beginning of 
something great. 

During the conference we have seen some amajzing maths, we had a great taam, 
it was really a naas workshop. 


Acknowledgement A great thank to the organisers Masha, Ling and Wadim! 
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A Matrix Theoretic Derivation ®) 
of the Kalman Filter Greet 


Johnathan M. Bardsley 


Abstract The Kalman filter is a data analysis method used in a wide range of 
engineering and applied mathematics problems. This paper presents a matrix- 
theoretic derivation of the method in the linear model, Gaussian measurement error 
case. Standard derivations of the Kalman filter make use of probabilistic notation 
and arguments, whereas we make use, primarily, of methods from numerical 
linear algebra. In addition to the standard Kalman filter, we derive an equivalent 
variational (optimization-based) formulation, as well as the extended Kalman filter 
for nonlinear problems. 


1 Introduction 


We start with the standard linear model with Gaussian measurement error: 
b= Ax+e, (1) 


where b € R” is measured data; A € R”*” is a known observation matrix; x € R” 
is the unknown parameter vector to be estimated; e € R” is a zero-mean Gaussian 
random vector with covariance matrix Ce, which we denote by e ~ -V (0, Ce); and 
x € R” is the unknown vector that is to be estimated. 

The standard technique for estimating x, known as least squares estimation, 
was developed by Gauss in his study of planetary motion [1]. The extension of 
least squares estimation to the case when the unknown x is also assumed to be 
a Gaussian random vector, which will be the case for us, is known as minimum 
variance estimation [4]. 
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In the study of time varying phenomena, it is natural to generalize (1) as follows: 


Xx = Mgxz—1 + Ex, (2) 
by = Agxg + ex, (3) 


where Eq. (3) is defined analogous to (1) for each k; and in (2), My € R””*” is the 
known evolution matrix, x; is a Gaussian random vector, and Ex ~ -V (0, Cx, ). 
The Kalman filter [2, 5] is the extension of minimum variance (and hence least 
squares) estimation to the problem of sequentially estimating {x;, x2, .. .} given data 
{b,, b2, .. .} arising from the model in (2), (3). For the interested reader, a discussion 
of the progression of ideas from Gauss to Kalman is the subject of the excellent 
paper [3]. 

This paper is organized as follows. First, in Sect. 2, we present the basic statistical 
definitions and results that we will need in our later discussion. In Sect. 3, we define 
the minimum variance estimator, which we then apply to (2), (3) to derive the 
Kalman filter in Sect. 4. Finally, we present an equivalent formulation of the Kalman 
filter, which we call the variational Kalman filter, as well as the extended Kalman 
filter for the case when (2), (3) contain nonlinear evolution and/or observation 
operators. 


2 Statistical Preliminaries 


Let x = (x1,...,%n)/ be a random vector with E(x;) the mean of x; and E((x; — 
Li)?), where 4; = E(x;), its variance. The mean of x is then defined E(x) = 
(E(x1),..-, E(tn))", while the n x n covariance matrix of x is defined 


[cov(x)]ij = E(Qi — Mi)(Xj-—Mj), lSij<n. 


Note that the diagonal of cov(x) contains the variances of x1, ..., X,, while the off 
diagonal elements contain the covariance values. Thus if x; and x; are independent 
[cov(x)]i; = Ofori A j. 

The n x m cross correlation matrix of the random n-vector x and m-vector y, 
which we will denote Ixy, is defined 


Py = E(xy’), (4) 


where [E (xy? )]ij = E(x;y;). If x and y are independent, then Fyy is the zero 
matrix. Furthermore, 


E(x) =0 implies Myx = cov(x). (5) 
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Finally, given an m x n matrix A and a random n-vector x, it is not difficult to 
show that 


cov(Ax) = Acov(x)A/. (6) 


We end these preliminary comments with the probability density function of 
primary interest to us in this paper, the Gaussian distribution. If b is ann x 1 
Gaussian random vector, then its probability density function has the form 


1 
pp(b; w, C) = exp (-F b= wT" — Ww). (7) 


1 
J (t)* det(C) 


where pz € R” is the mean of b; C is ann x n symmetric positive definite covariance 
matrix of b; and det(-) denotes matrix determinant. As above, we will use the 
notation b ~ VW (p, C) in this case. For more details on introductory mathematical 
statistics, see one of many introductory mathematics statistics texts. 


3 Minimum Variance Estimation 


First, we consider model (1). When x is assumed to be deterministic, it is a standard 
exercise to show that if A has full column rank, the least squares estimator is given 
by 


x’ = (A7A)~!A’b. 
However, we are interested in the case in which x ~ .¥V(0, Cx). We assume, 


furthermore, that x and e are independent random variables. We now define the 
minimum variance estimator of x. 


Definition 1 Suppose b is defined as in (1), x ~ -¥V(0,Cx), and e and x 
independent random vectors. Then the minimum variance estimator of x given b 
has the form 

xest Bb 
where B € R”*” solves the optimization problem 


B = arg_min  £ (\IBb —x|/I3) 
BeR’” xm 


Because our model (1) is a linear model with Gaussian measurement error, B has an 
elegant closed form, as described in the following theorem. 
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Theorem 1 /f Iyp is invertible, then the minimum variance estimator of x from b 
is given by 


xt = (pI), )b. 
Proof First, we note that 
E(||Bb — x||}) = trace (E[Bb — x)(Bb — x)")) , 
= trace (BE[bb" ]B’ — BE[bx! ] — E[xb’ ]B? + E[xx")) . 


Then, using the distributive property of the trace function and the identity 
d d ° 
Spitace(B' C) = (jr«Bo) =C, 


we see that d E(||Bb — x||?)/dB = 0 when 
B=lyl yy. 


which establishes the result. 


In the context of (1), and given our assumptions stated above, we can obtain a more 
concrete form for the minimum variance estimator. In particular, we note that since 
x and e are assumed to be independent, ye = Fex = 0. Hence, using (1), we obtain 


yp = E[x(Ax + e)"], 
=feA'. 
Similarly, 
Tp» = E[(Ax + e)(Ax + e)"], 
= ADyxA? + Dee. 
Thus, since yy = Cy and Fee = Ce, the minimum variance estimator has the form 
xo = Bb 
= C,A7(AC,A? + C,)~'b, 
= (ATCS'A+ Cy!) 'ATCE'D. (8) 


We note, in passing, that (8) can also be expressed as 


x! — arg min {IIAx _ bile + Ix] ; (9) 
x e x 
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: — def 
where “argmin” denotes “argument of the minimum” and IIxllé = x! Cx. 


This establishes a clear connection between minimum variance estimation and 
generalized Tikhonov regularization [4]. Note in particular that if Ce = o71 and 
Cy, = oI, roblem (9) can be equivalently expressed as 

24P q y exp 


xt = arg min | Ax — bl|5 + (07/03) x3} ; 


which has classical Tikhonov form. This formulation is also equivalent to maximum 
a posteriori (MAP) estimation. 


4 The Kalman Filter 


In the previous section, we considered the stationary linear model (1), but suppose 
our model now has the form (2), (3). Equation (2) is the equation of evolution for 
xx With My the n x n linear evolution matrix, and Ex ~ (0, Cpg,). In Eq. (3), 
b; denotes the m x 1 observed data, Ay the m x n linear observation matrix, and 
ex ~ WV (0, Ce, ). In both equations, k denotes the time index. 

The problem is to estimate x, at time k from bx and an estimate s of the 
state at time k — 1. We assume xy", ~ WV (xx-1, C7*",). To facilitate a more 
straightforward application of the result of Theorem |, we rewrite (2), (3). First, 
define 


xe = Mexe, (10) 
Zk = Xk — Xu, (11) 
r= bz = ALX(. (12) 


Then, subtracting (10) from (2) and Axx? from both sides of (3), and dropping the 
k dependence for notational simplicity, we obtain the stochastic linear equations 


z= M(x— x") +E, (13) 
r=Az-+e. (14) 


The minimum variance estimator of z from r given (13), (14) is then given, via 
Theorem | (note that z is a zero mean Gaussian random vector), by 


z°' — TT tr. 
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We assume that x — x°*! is independent of E, and that z = x — x“ is independent of 
e. Then, from (4), (5), (6), (13) and (14), we obtain 


Pon = MC°'M? + Cp C2, (15) 
Iu = CAA! , 
Pee ACA +. 


where C**' and C®% are the covariance matrices for x°*’ and x“, respectively. Thus, 
finally, the minimum variance estimator of z is given by 


z°' = C“A™ (ACZA! + C,)7! 4, (16) 


From (16) and (11) we then immediately obtain the Kalman Filter estimate of x 
given by 


<" = x’ + H(b — Ax’), (17) 
where 
H = C“A’ (AC“A! + C)7! (18) 


is known as the Kalman Gain matrix. 
Finally, in order to compute the covariance of x , we note that by (17) and (3), 


x{' = ([— HA)x“ + He + HAx, 


where x is the true state. Given our assumptions and using (6), the covariance then 
takes the form 


C4" = (I— HA)C*(I— HA)’ + HC,H’, 


which can be rewritten, using the identity HC.H’ = (I — HA)C“A‘H’, in the 
simplified form 


Ce! = C4 — HAC’. (19) 


Incorporating the k dependence again leads directly to the Kalman filter iteration. 
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The Kalman Filter Algorithm 
Step 0: Select initial guess x§°’ and covariance C§°", and set k = 1. 


Step 1: Compute the evolution model estimate and covariance: 
A. Compute xf = Mgxi"";; 
B. Compute Cf = My Ce"! Mf + Cr, := CZ. 
Step 2: Compute the Kalman filter estimate and covariance: 
A. Compute the Kalman Gain H; = CAL (A, CtAT ce OF) ae 
B. Compute the estimate xe = xy + Hy (by — Axx); 


C. Compute the estimate covariance cr = Cy — HyAc Ce. 


Step 3: Update k := k + | and return to Step 1. 


4.1 A Variational Formulation of the Kalman Filter 


As in the stationary case (see (8), (9)), we can rewrite Eq. (16) in the form 
gest — (ATCS'A es (C*)~})1 A? Cor, 
which, yields, using (11), the Kalman filter estimate 
xo? — x4 4 [ATCO 1A 4 (C4)! TAT Cob — Ax), 
= argmin {0 “ 5 — Ax)’ Cy! (b — Ax) + 5 — xT (C4) “1x - x) 
It can be shown using a Taylor series argument that 
xa = VER) VER), (20) 


where V£ and V7¢ denote the gradient and Hessian of £ respectively, and are given 
by 


Ve(x) = ATC! (b — Ax) + (C%)! (x — x”), 
V7e(x) = A’C,'A + (C7)'. 


By the matrix inversion lemma, we have 


(AC Ase) aC a CA ACA 2 GC, AC’, 
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Then from Eqs. (18) and (19), we obtain the interesting fact that 
Co = VEX), (21) 


This allows us to define the following equivalent formulation of the Kalman filter, 
which we call the variational Kalman filter. 


The Variational Kalman Filter Algorithm 


Step 0: Select initial guess x§°’ and covariance C6", and set k = 1. 


Step 1: Compute the evolution model estimate and covariance: 
A. Compute xf = Myxi*",; 
B. Compute C4 = M;,Cé'M/ + Cp, := Cé. 


Step 2: Compute the Kalman filter estimate and covariance: 
A. Compute the estimate xf°’ = arg ming ¢(x); 


C. Compute the estimate covariance Cc = V7£(x)7!. 


Step 3: Update k := k + | and return to Step 1. 


A natural question is, what is the use of this equivalent formulation of the Kalman 
filter? Theoretically there is no benefit gained in using the variational Kalman 
filter if the estimate and its covariance are computed exactly. However, with the 
variational approach, the filter estimate, and even its covariance, can be computed 
approximately using an iterative minimization method, such as conjugate gradient. 
This is particularly important for large-scale problems where the exact Kalman filter 
is prohibitively expensive to compute. 


4.2. The Extended Kalman Filter 


The extended Kalman filter is the extension of the Kalman filter when (2), (3) are 
replaced by 
Xk = M(xk-1) + Ex, (22) 
Dye = L(Xk) + €, (23) 
where .@ and & are (possibly) nonlinear functions. The extended Kalman filter is 


obtained by the following simple modification of either of the above algorithms: in 
Step 1, A use, instead, x? = M(x{""), and define 


= BM (1) and Ax = 0. (xi) 


M 9 
. Ox ox 


(24) 


af ‘ 
where 5, denotes the Jacobian of /f. 
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5 Conclusions 


We have presented a derivation of the Kalman filter that utilizes matrix analysis 
techniques as well as the Bayesian statistical approach of minimum variance 
estimation. In addition, we presented an equivalent variational formulation, which 
we call the variational Kalman filter, as well as the extended Kalman filter for 
nonlinear problems. 
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Approximate Bayesian Computational M®) 
Methods for the Inference of Unknown Ghost for 
Parameters 


Yugin Ke and Tianhai Tian 


Abstract Recent advances in biology, economics, engineering and physical sci- 
ences have generated a large number of mathematical models for describing the 
dynamics of complex systems. A key step in mathematical modelling is to estimate 
model parameters in order to realize experimental observations. However, it is 
difficult to derive the analytical density functions in the Bayesian methods for these 
mathematical models. During the last decade, approximate Bayesian computation 
(ABC) has been developed as a major method for the inference of parameters in 
mathematical models. A number of new methods have been designed to improve 
the efficiency and accuracy of ABC. Theoretical studies have also been conducted 
to investigate the convergence property of these methods. In addition, these methods 
have been applied to a wide range of deterministic and stochastic models. This 
chapter gives a brief review of the main ABC algorithms and various improvements. 


1 Introduction 


Since more and more natural and social science problems involve the uncertainty 
in observations, statistical models and parameter inference play an important role 
in the development of mathematical methods for studying real-world problems. 
In particular, the era of big data has generated huge amount of data whose 
volume is increasing at a very fast speed. Mathematical and statistical models are 
becoming more and more complex in terms of the network size and regulatory 
relationships. Thus effective and efficient methods are strongly needed to infer 
unknown parameters in these models in order to reduce the simulation errors against 
the experimental data. 

There are two major types of inference methods, namely the optimization 
methods and Bayesian statistical methods. The optimization methods are designed 
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to minimize an objective function by searching for parameters within a given 
parameter space in a directed manner. The inferred set of parameters produces 
the best fit to the experimental data [30]. A variety of effective approaches have 
been developed in recent years. Among them, the genetic algorithm is a popular 
and effective approach and has been widely applied to various models [49]. These 
methods all share two main ingredients: a cost function for specifying the distance 
between simulated data and experimental data and an optimization algorithm for 
searching for parameters in order to optimize the cost function. However, when the 
landscape of the cost function is complex, it is difficult for these methods to find 
the global optimum. To tackle this challenge, the global optimization methods have 
been proposed to explore the complex surfaces as widely as possible. Comparison 
studies have been conducted to examine the efficiency of several global optimization 
algorithms for the test models [21]. 

Compared with the optimization methods, the Bayesian inference methods can 
estimate the probability distributions of parameters by using the Bayes’ rule to 
update the prior probability estimates. In addition, Bayesian methods are more 
robust in dealing with stochastic models and/or experimental data with noise 
[22, 54]. In recent years Bayesian methods have been successfully used in a diverse 
range of fields and provide the promise to applications [47]. The recent advances in 
approximate Bayesian computation (ABC) provide effective methods without any 
restriction on the requirement of the likelihood function [7, 16, 32, 41, 51]. This 
chapter provides a brief review for the recent development in ABC, including the 
rejection ABC, regression ABC, Markov chain Monte Carlo (MCMC) ABC and 
sequential Monte Carlo (SMC) ABC. We also discuss the relevant improvements 
and extensions of these methods, such as the choice of summary statistics. 


2 Principle of Bayesian Inference 


For the Bayesian inference problems, model parameters are treated as random 
quantities along with the observation data. The Bayesian inference involves the 
estimation of the posterior probability 


0 
p@ly) = Sa cc p(yl8)x(8), (1) 


where y is the observation data and the parameter vector of the model is 6 (0 € 
© C R4,q = 1). In addition, 2 (@) is the prior distribution representing the prior 
beliefs about the parameters under investigation, and p(y|@) is a likelihood function 
of parameter 0. The marginal distribution, defined by 


P(Yy)= / P(y|0)x(@)de (2) 
660 
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often involves a high-dimensional integral, and p(@|y) is the posterior probability 
distribution which expresses the uncertainty regarding 6 conditional on the observed 
experimental data y. All the Bayesian inference about @ will be based on the esti- 
mated p(6|y). However, the integrations which produce the Bayesian quantities of 
interest (such as marginal posteriors, marginal moments, and probability intervals) 
can only be performed analytically when the density function p(y|@) is available, 
which can be achieved only for relatively simple cases. 

When the density function p(y|@) is available, the classic Metropolis-Hasting 
algorithm is applied to find a Markov chain of the parameters, which is given below. 


Algorithm 1 Metropolis-Hasting algorithm 
Given the observation data y ps, proposal distribution g(-), and an initial sample from the prior 
distribution 6 ~ (6). 


At iteration i > 0 


1. Generate a sample from the proposal distribution 0~ g(6|a). 
2. Draw a sample from the uniform distribution  ~ U (0, 1) and calculate the ratio 


1(8') p(Yoos|0')q (0 |0') 


aT Cn rash ina eRe TT eae | 3 
70) pyons19)q O10) 2) 


a = min(1, 


3. If <a, accept the sample as 90+) = 4’; otherwise reject the sample. 
4. Repeat steps 1 ~ 3 until the required number of posterior samples is obtained. 


Based on the classic Metropolis-Hasting algorithm, a number of more sophis- 
ticated methods have been designed, such as the Markov chain Monte Carlo 
(MCMC), the importance sampling (IS), and the sequential Monte Carlo (SMC) 
[20, 42]. The MCMC sampling methods usually break a high-dimensional problem 
into a number of smaller dimensional problems and generate a sample of dependent 
or correlated draws which can be treated as a realization of a Markov chain with 
equilibrium distribution equal to p(@|y). Once the convergence to p(@|y) occurs, 
any subsequent simulated value can be viewed as a sample from p(6@|y) and all 
these samples are used to estimate the posterior quantities of interest. 

An alternative approach is the Gibbs sampling if the marginal distribution of 
each parameter is available. In the basic version, the Gibbs sampling is a special 
case of the Metropolis-Hastings algorithm. However, in its extended versions, these 
methods can be considered as a framework for sampling each variable (or more 
generally, each group of variables) from a number of variables in turn. It can 
also be incorporated into the Metropolis-Hastings algorithm (or other methods) to 
implement in one or more sampling steps. The detail of this algorithm is given 
below. 
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Algorithm 2 Gibbs sampling 
Given the observation data yops, and an initialize sample from the prior distribution 60 = 


«++, MT ~ 2). 


At iteration i > 0 


oft 


1. Generate a sample for using 
aG+) a (610. we 0, Yobs)- 
2. For j = 2,..., p — 1, sample for aes using 
ae ag (0,0), on are as Pa ae, Yobs)« 
3. Generate a sample for anes using 
GE rela OL mn a Sanat 


4. Repeat steps 1 ~ 3 until the required number of posterior samples is obtained. 


Although these Bayesian inference methods are effective, they are based on the 
availability of the likelihood function. However, it may be difficult to derive the 
likelihood function directly for many complex models. For example, the analytical 
density function may not be available, or it may be expensive to calculate the 
likelihood. In some cases, the observed experimental data are insufficient to obtain 
a tractable likelihood. This intractability prohibits the direct implementation of a 
generic MCMC algorithm. 


3 Rejection ABC Method 


To deal with complex models without analytical likelihood, a number of algorithms 
have been developed during the past two decades, which are referred to as the 
likelihood-free inference or Approximate Bayesian Computation (ABC). The ABC 
method is based on the following intuition: namely if a sample of the unknown 
parameter produces the simulation that matches the observed dataset, this sample 
should be close to the exact value of the parameter. Conversely, if the simulated 
dataset differs from the observed data substantially , this sample should not be 
considered as the estimate of the parameter. Thus the method strongly relies on the 
metric to determine the distance between simulated dataset and observed dataset. 
In the late 90s, ABC was first introduced as a rejection technique bypassing the 
computation of the likelihood function [48]. Later, Pritchard et al. proposed a 
generalisation based on an approximation of the target [40]. In recent years, the ABC 
methods have been proposed with various improvements and have been applied to a 
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wide range of application fields, such as population genetics, ecology, epidemiology 
and systems biology [2, 3, 17, 25, 27]. 
The ABC spirit is based on the following algorithm [43]. 


Algorithm 3 Likelihood-free rejection sampling 


Given the observation data y,,s, and prior distribution 7 (0). 


1. Generate a sample from the prior distribution 6’ ~ 7(@). 

2. Simulate the model using 0’ to get a dataset x ~ p(x|6’). 

3. Accept the sample 6’ if x = yops, otherwise reject it. 

4. Repeat the above steps until the required number of posterior samples is obtained. 


In this paper Rubin just exhibited this algorithm as an intuitive way to understand 
the posterior distributions from a frequentist perspective rather than using it for 
interring models where the likelihood function was not available [43]. Then Tavaré 
et al. proposed an implementation of the rejection algorithm for the Bayesian 
inference of parameters in population genetics. When the data are discrete and of 
low dimension, this algorithm is effective. However, the probability of acceptance 
for a sample is usually very low. 

As mentioned earlier, the rejection algorithm is dependent on a metric to measure 
the distance between the simulation and observation data. For inference problems 
with continuous distributions, or the datasets are high dimensional, it may be 
necessary to use summary statistics to reduce the dimensionality. Pritchard et al. 
suggested the prototype rejection-ABC algorithm as follows in a population genetics 
setting [40]. 


Algorithm 4 Rejection ABC method 


Given the observation data yo,;, prior distribution 7(@), summary statistics s(-), tolerance level 
€ > O, and distance function p(-, -) 


1. Generate a sample from the prior distribution 6’ ~ 7(@). 

2. Simulate the model using 0’ to get a dataset x ~ p(x|0’). 

3. Calculate the distance between the simulation and experimental data p(s (x), Sons) based on the 
given summary statistics s(-). 

4. Accept the sample 6’ if 


PS), Sos) < &. 


Otherwise reject the sample. 
5. Repeat steps 1~ 4 until the required number of posterior samples is obtained 


The basic idea of ABC is to use summary statistics with a small tolerance 
to produce a good proximation of the posterior distribution. The output is the 
samples of parameters from the distribution p(@|p(s(x), Sops) < €). The choice 
of summary statistics is very important which we will discuss later. In addition, the 
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tolerance ¢ in Algorithm 4 may determine the efficiency of ABC. The basic ABC 
rejection algorithm may result in long computing time when a prior distribution is 
far away from posterior distribution. In addition, there is no learning process in this 
algorithm; and thus no information could be obtained from the previous accepted 
samples of parameters. When the search space is complex, the convergence rate of 
this algorithm may be very slow. 


4 Regression ABC 


To improve the efficiency of the rejection-ABC algorithm, Beaumont et al. [6] 
introduced the regression approach by explicitly modeling the discrepancy between 
the simulated summary statistics and that of the observed data through the following 
algorithm. 


Algorithm 5 Regression ABC 


Given the observation data yops, prior distribution 7(@), summary statistics s(-), tolerance level ¢, 
and distance function p(-). 


1. Generate a sample from the prior distribution 6 ~ (6). 

2. Simulate the model using @ to get a dataset x ~ p(x|0) and compute the summary 
statistics s = s(x). 

3. Repeat steps 1 and 2, until N pairs {90, s} are obtained. 

4. Associate each pair (, s) with a weight wo® a Ke (p(s — Sobs)). The weighted kernel 
can be selected as: 


et(—(t/e)?) t<e, 


t>é. 


K.(t) = 


5. Apply a regression model to the n points, which have nonzero weights to obtain an estimate of 
E(6|s(x) = s), denoted as m(s). 
6. Adjust each sample to 


PO = ta(Sobs) + 0 — Mis). 


7. Use {6*© , w} to approximate the posterior distribution. 


Here the samples 6“ are adjusted with weights w) > 0 to account for the 
difference between simulated summary statistics and that of the observed data. 
Beaumont et al. [6] suggested a local linear model in the region of sops, given by 


gO = m(s) abe e , 


m(s) =a + BT (s — sobs), 
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where e“ are zero-mean random variates with common variance, m(s) is the 
conditional expectation of 6 givens. 

In this approach, the choice of ¢ involves a bias-variance trade-off, namely the 
increase of ¢ will reduce the variance because of a larger sample size for fitting the 
regression. However, this will also increases bias arising from the departure from 
the linearity and homoscedasticity [8]. 

When the number of samples is not very large due to the computational 
constraints, the homoscedastic assumption is no longer valid, because the neigh- 
bourhood of samples where w) ¥ 0 is too large. Thus Blum et al. [9, 10] extended 
this algorithm to a nonlinear and heteroscedastic model, given by 


gO — ms") ais a(sP)e®, 
where a (s“ ) = Var(0 js“ ) denotes the conditional variance. The variance is then 


estimated by using a second regression model for the logarithm of the squared 
residuals, given by 


log — m(s))” = log(a(s)) +1, 


where n are independent, zero-mean variates with common variance. The param- 
eter adjustment then can be performed as follows: 


a (Sops) 


5 (sO) 


PO = (Sons) + OO — ma(s)) x (4) 


where G(s) denotes the estimator of o(s). Here e plays the same role as for 
homoscedastic model, but it has more flexibility on deviations from homoscedas- 
ticity. 


5 MCMC-ABC Algorithm 


In the Rejection-ABC and Regression-ABC algorithms, parameter values are 
sampled from the prior distribution. Thus the acceptance rate may be low if the 
prior and posterior distributions are quite different. In fact, using samples from a 
non-informative prior is very inefficient because this scheme does not account for 
the data at the proposal stage and thus may lead to proposed values located in low 
posterior probability regions. To address this issue, Marjoram et al. [33] introduced 
the following MCMC-ABC algorithm. 
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Algorithm 6 MCMC-ABC algorithm 

Given the observation data y ops, summary statistics s(-), tolerance level ¢, distance function p(-), 
and proposal distribution q(-). 

Initialize the first sample from the prior distribution 6 ~ 7(@). 


At iteration i > 0 


1. Generate a sample from the proposal distribution 0 ~ (Old) . 
2. Simulate the model using 0 to get a dataset x ~ p(x|@ ). 
3. Draw a sample from the uniform distribution 7 ~ U (0, 1), and calculate the ratio 


1(0')q(0|6') 


BEM TOM 4TH) 


x I(p(s(x), Sobs) < €)). 


Here / (A) is an indicator function. ; 
4. If w <a, accept the sample 6¢+!) = 6’; otherwise 9¢+) = 9, 
5. Repeat steps 1~ 4 until the required number of posterior samples is obtained. 


This algorithm has a similar structure as that of the standard MCMC. Both 
algorithms use a proposal distribution and prior distribution to calculate the ratio. 
The difference is that the density function is used in MCMC for computing the ratio, 
while in MCMC-ABC we treat the ratio of density function as one if the simulation 
error satisfies the criterion. Thus the performance of MCMC-ABC strongly depends 
on the selection of proposal distribution and prior distribution. 

A potential drawback of MCMC-ABC is the selection of tolerance level ¢ and 
proposal distribution g(6|@) that may lead to expensive pilot runs [26, 44]. The 
convergence property of the generated chain (6), .-- ,@) is important because 
MCMC algorithm may suffer if the proposal distribution is poorly chosen [14]. A 
potential issue is that the chain may get stuck in a low probability region of the 
posterior and lead to a poor approximation [18]. Since the proposed sample 6’ must 
meet two criteria, the rejection rate of the MCMC ABC may be extremely high. 


6 SMC ABC 


To tackle the challenges in MCMC-ABC, sequential Monte Carlo sampling tech- 
niques have been introduced to ABC. Sequential Monte Carlo sampling differs 
from the MCMC approach by using the technique of particle filtering. Rather than 
drawing one candidate sample 6 ata step, this algorithm considers a pool with a 
large number of samples (O,, rey Oy) simultaneously and treats each sample as a 
particle. Sisson et al. [45] proposed a method which embed ABC simulation steps 
in Sequential Monte Carlo algorithm based on the theoretical work in [15]. This 
method generates sample from a sequence of approximate ABC posteriors under 
successively smaller acceptance tolerances [4, 46, 50]. SMC-ABC concentrates on 
simulating a dataset from the parameter regions with relatively high acceptance 


Approximate Bayesian Computation 523 


probabilities and can adapt tuning choices such as acceptance tolerances during the 
computation, which has potential advantages over the Rejection-ABC or MCMC- 
ABC. Here we illustrate the algorithm of Beaumont et al. [4]: 


Algorithm 7 SMC-ABC 

Given the observation data y,»;, summary statistics s(-), distance function p(-), tolerance 
thresholds e; > --- > er, and density kernel K (-). 

1. Atiteration t = 1, 


a. Fori = 1,..., N, repeat: 


i. Sample sample a?) ~ 1(@), and simulate dataset x ~ pore”) ; 
ii. Accept a? if p(s(x), Sops) < €1, otherwise reject this sample. 
iii. Set weight of” = 1/N. 
b. Take B as twice the empirical variance of the as 
2. Atiteration2 <t <T 
a. Fori = 1,..., N, repeat: 
i. Pick 6* from 6\'~"’s with probabilities of 
ii. Generate sample 9g ~ K(6\6F, tT), and simulate a dataset x ~ ple). 


iii. Accept 6? if p(s(X), Sops) < &, otherwise reject this sample. 
iv. Set the weight of this accepted particle as 


an 


16) 


(t) 
w —————— ee 

N o-l Mygt-l) 2 
y j=19; KO! 0; sth) 


L 


b. Take eae 1 a8 twice the weighted empirical variance of the 6? S. 


At the first iteration, this algorithm draws samples from the prior distribution 
z(@), simulates the model using the sample, calculate summary statistics, and select 
N samples that satisfy the error criterion. This step actually is the rejection-ABC 
algorithm. However, at the subsequent iterations, samples are drawn from a density 
kernel K (@) based on the previous particle population. A Gaussian kernel is used in 
Beaumont et al. [4], given by 


K (6, 10", 2) = oft, 1? — 07}, 


where ¢(-) is the density of a normal distribution. This algorithm effectively 
performs the repeated importance sampling technique, which is also known as 
population Monte Carlo [12]. Similar algorithms have been proposed by using 
different formulas to calculate the weights and different kernel functions [4, 46, 50]. 

SMC-ABC has addressed a potential drawback of the rejection and regression 
approaches. If the data are informative, the posterior distribution may be very narrow 
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compared with the prior, then the rejection and regression algorithms may become 
inefficient. Thus repeatedly sampling from a gradually improving approximation of 
the posterior will make the distribution of summary statistics become closer to the 
posterior distribution, and increase the density of samples whose summary statistics 
is located in the vicinity of the target [5]. 


7 Choice of Summary Statistics 


As discussed in previous sections, the posterior distribution of dataset p(@|yops) is 
approximated by 


PO|Sovs) & P(Sobs|O)@), 


where Sops is the summary statistics which usually has lower dimension than that of 
the data yous. If sons is sufficient, 


p@ lSobs) = P(|yYobs)- 


When sops is highly informative, p(@|sops) © p(@|yops) iS a good approximation. 
However, for many practical problems, it is hard to derive sufficient statistics or 
even a highly informative statistics. An appropriate choice of summary statistics is 
required to balance the informativeness and low-dimensionality. In some application 
fields, there has been a history of the development of summary statistics within a 
model-based framework in recent years. However, it is also possible that empirical 
summaries can be used without any strong theory to support them. Thus the 
selection of informative summary statistics is one of the important steps in the 
application of ABC. In recent years a number of methods have been proposed 
regarding the selection of summary statistics [19, 38]. 

Joyce and Marjoram [24] first proposed the e-sufficiency concept and score of 
statistics for selecting an additional summary statistic s, from the candidate set, 
when the model already has summary statistics 5,,..., 5,1. Later three methods 
regarding the choice of summary statistics have been used in application [31], 
namely 


1. selection of a subset of the summary statistics that maximizes prespecified 
criteria such as the Akaike Information Criterion [11] or the entropy of a 
distribution [35]; 

2. partial least square regression to get linear combinations of the original summary 
statistics that are maximally decorrelated and highly correlated with the parame- 
ters [53]; and 

3. summary statistics are chosen by minimizing a loss function under the assump- 
tion of a statistical model between parameters and transformed statistics of 
simulated data [1, 19]. 
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Blum et al. [11] provided a comprehensive review of the principal methods. 
However, this topic still remains as a challenging problem in Bayesian inference. 


8 Early Rejection ABC 


To reduce the simulation time, a number of inference methods have been proposed 
based on the idea of early rejection. For example, the delayed ABC divides a method 
into two stages [13]. In the first stage, a sample of parameters may be rejected or 
accepted by using an approximated posterior distribution. If itis accepted, a standard 
ABC method will be applied in the second stage to evaluate the discrepancy between 
the observation data and simulation. This idea has been used in the MCMC-ABC 
for inferring stochastic differential equation models, in which the prior distribution 
and proposal distribution are used in the first stage for early rejection [37]. Based 
on the MCMC-ABC [37], a sample is rejected if the following ratio is less than a 
sample w ~ U(O, 1) by using the same notations in Eq. (3) 


1 (0*)m (6;|0") 
> ———__.. (5) 

1 (6,) 1 (0*|6;) 

In this approach, the kernel density function p()|@) is removed from the ratio above. 
Thus the performance of this early-rejection technique is fully dependent on the 
choice of the proposal density function 2 (6*|6;). 

A recently published approach is the Lazy ABC, which proposes a random 
stopping rule to abandon simulations with unsatisfactory accuracy [39]. This method 
makes ABC more scalable to applications where simulation is expensive. The 
detailed algorithm is given below 


Algorithm 8 Lazy ABC 


Input: prior density 2 (6) and importance density g(@), observation data y,,;, summary statistics 
s(-), tolerance level ¢, distance function p(-,-), proposal distribution g(-), and a continuous 
probability function a(6, x). 


At iterationi = 1: N 


1. Generate a sample from importance sampling 6* ~ g(@). 

2. Simulate the model to get a dataset x* ~ p(x|6*) and let a* = a(0*, x*). 

With probability w* continue to step 4. Otherwise perform early rejection: namely let /* = 0 
and go to step 6. 

4. Simulate the model to get dataset Y* ~ p(Y|6*, x*). 

5. Set ligc = 1d(s(y*), 8(vons)) < €] and l* = [4 pc /a*. 

6 

7 


Nad 


. Set w* = 1*1(6*)/g(0*). 
. Repeat steps 1~ 6 until the required number of posterior samples is obtained. 


Output: A set of N pairs of (0*, w*) values. 
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The detailed information of Lazy importance sampling and multiple stopping 
decision can be found in [39]. 


9 ABC Software Packages 


A number of computer software packages have been designed in recent years to 
implement ABC in different platforms using various computer languages. A soft- 
ware package, BioBayes, provides a framework for Bayesian parameter estimation 
and evidential model ranking over models of biochemical systems using ordinary 
differential equations. This package is extensible allowing additional modules 
to be included [52]. A Python package, ABC-SysBio, implements parameter 
inference and model selection for dynamical systems in the ABC framework [29]. 
This package combines three algorithms: ABC rejection sampler, SMC ABC for 
parameter inference, and SMC ABC for model selection. It is designed to work with 
models written in Systems Biology Markup Language (SBML). Deterministic and 
stochastic models can be analyzed in ABC-SysBio. In addition, a computational 
tool SYSBIONS has been designed for model selection and parameter inference 
using nested sampling [23]. Using a data-based likelihood function, this package 
calculates the evidence of a model and the corresponding posterior parameter 
distribution. This is a C-based, GPU-accelerated implementation of nested sampling 
that is designed for biological applications. 

Also in the R platform, a number of software packages have been designed. 
Among them, package abc implements Rejection ABC with many methods of 
regression post-processing; while EasyABC implements a wide suite of ABC 
algorithms but not post-processing [36]. Package abctools has been designed to 
complement the existing software provision of ABC algorithms by focusing on tools 
for tuning them. It implements many previous unavailable methods from literature 
and makes them easy available to the research community [36]. In addition, there 
are also two ABC packages implemented as MATLAB toolbox. EP-ABC has been 
designed for state space models and related models, and ABC-SDE for inferring 
parameters in stochastic differential equations [37]. There are still some other 
software packages that have been reviewed in [36], including ABCreg, ABCtoolbox, 
Bayes SSC, DIY-ABC, and PopABC. 


10 Conclusion 


In this chapter, we have reviewed a number of algorithms of ABC, together with the 
relevant improvements, from choice of summary statistics to early rejection, aiming 
at increasing the statistical accuracy and computational efficiency. In addition, we 
give a few of the widely used software packages for the practical use of ABC 
algorithms. In recently years, the ABC methods have been applied to a wide range of 
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inference problems in biology, economics, engineering and physical sciences. These 
applications have also raised more challenging questions for parameter inference, 
such as high-dimensional data [28, 34] and stochastic modeling [55], which provides 
interesting topics for future research. 
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The Loop-Weight Changing Operator 4») 
in the Completely Packed Loop Model pe 


Bernard Nienhuis and Kayed Al Qasimi 


Abstract Loop models are statistical ensembles of closed paths on a lattice. The 
most well-known among them has a variety of names such as the dense O(7) loop 
model, the Temperley-Lieb (TL) model. This note concerns the model in which the 
weight of the loop n = 1, and a local operator which changes the weight of all the 
loops that surround the position of the operator to some other value. A conjecture of 
the expectation value of the one-point function of this operator was formulated 15 
years ago. In this note we sketch the proof. 


1 Introduction 


It has long been recognized that loop models can represent many different local spin 
models in statistical mechanics. The model we deal with in this note, was introduced 
[1] as a representation of the Potts model. It has a free parameter, the weight of a 
loop, which is the square root of the number of states of the Potts model. The case 
that this weight is unity corresponds to the bond percolation model, and is the case 
we deal with here. The configurations of the model are a tiling of the square lattice, 
in which each face of the lattice is covered with one of two tiles 


7 and ~ 


Thus in every configuration the red arcs in the tiles form paths on the lattice, 
which are either closed (hence loops), or terminating at the boundary, if there is 
one. The partition sum of the model is trivial, it is the product over all faces of 
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the sum of the two weights of the faces. Non-trivial are observables, which give 
weights to the configurations according to some specific properties of the paths. 
The loop-weight changing operator (LWCO) inserted at a given vertex, gives the 
loops surrounding that vertex a new weight w possibly different from that of the 
other loops, i.e. from one. The expectation value of the one-point function of this 
operator is the generating function of the probabilities of having a specific number 
of loops surrounding the point of insertion. For brevity we will use the short-hand 
LWCO also for one-point function of this operator, trusting the context will make it 
unambiguous. 

A new approach to the study of this model originates in the work of Razumov and 
Stroganov [2] who found a connection between the XXZ model and combinatoric 
problems as Alternating Sign Matrices (ASM) and Plane Partitions (PP) [3]. 
A connections with loop models followed quickly [4] and led to the famous 
Razumov-Stroganov conjecture featuring a connection between two types of loop 
models on different geometries [5]. It was proven by Cantini and Sportiello [6]. 
These connections led to a wealth of explicit formulae for expectation values 
of observables and of indicator functions, some proven others conjectured. The 
value of these is that the formulae are (supposedly) exact with finite distances and 
geometries, rather than only in the scaling limit. One of these observables is the 
LWCO that turns the loop weight of surrounding loops into w, on a cylinder of 
infinite length and circumference L. Mitra and Nienhuis [7] conjectured the value 
of its one-point function P(L, w) = F(L, w)/F(L, 1), with 


L-1 
Fin.@ +a \2ta yO det a (F) bad. (1) 


r,s=0 


This expression was not based on any theoretical understanding, let alone a deriva- 
tion or proof. It was completely guessed from the recognition of the coefficients in 
the polynomial, and subsequently verified for large but finite L. We remark that the 
symmetry for a — 1/a is manifest in the LHS, but not in the RHS of Eq. (1). 

A discussion with Christian Hagendorf at the Matrix workshop Statistical 
Mechanics, Combinatorics and Conformal Field Theory in 2017, eventually led to 
a proof, which we will sketch in this note. The line of argument is as follows. We 
will first generalize the model to be inhomogeneous, thus introducing a number of 
variables on which the LWCO depends. We will show that the LWCO is a rational 
function of these variables. A family of recursion relations in the size of the system 
can be used to fix the value of the numerator and the denominator, for a number 
of values of one of the variables. The number of values suffices to completely 
determine these functions, by the polynomial interpolation formula. A publication 
of Hagendorf and Morin-Duchesne [8] suggested the inhomogeneous generalization 
of Eq. (1). It then remained to show that this expression satisfies the same recursion 
relation as the LWCO. 
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2 Inhomogeneous TL Model 


An important step towards a partial proof of the RS conjecture by Di Francesco and 
Zinn-Justin [9], was making the TL model inhomogeneous, by associating a variable 
to each column and each row of faces in the lattice. We imagine the axis of the 
cylinder to be vertical, so the columns run along the length of the cylinder, and the 
rows form rings around the cylinder. These variables are often called rapidities due 
to their role in the relativistic field theory which is the scaling limit of the model. The 
Boltzmann weights at a particular face then depend on the two rapidities associated 
with the column and row the face is in. Specifically the Boltzmann weight of a face 
can be written as 


=i 
ee ee BL a 
R(w, z) qu — quiz ‘ales qw—¢iz Dy (2) 


where z and w are the variables associated with the column and row, respectively, 
that the face belongs to, and g = e*"'/3, The two coefficients add up to one, and 
are real positive when z/w = e'? with @ € (0, 27/3). The transfer matrix can be 
written as 


L 
T(w, 2) = T(w; 21,22,-..,2L) =| [ Riv), (3) 


i=1 


where we use z as a shorthand for {z1, z2,..., zz}. A term in the expansion of this 
product corresponds to the graph 


JARRE 


This transfer matrix acts as a stochastic matrix in the space of so-called link 
patterns. In each link pattern the edges cut by the rim of the cylinder are connected 
pairwise, by paths that do not intersect, as in the following example for L = 7 and 
L=8 
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For odd L the link pattern includes an unpaired edge, which is the connected to a 
path all along the half-infinite cylinder. For even L we mark the disk with a puncture, 
to distinguish if a path between two edges at the rim is along one side of the cylinder 
or the other. When two edges that are connected in a link pattern are reconnected by 
the transfer matrix or another operator, the resulting loop is removed. 

Due to the Yang-Baxter equation [T(w, z), T(v, z)] = 0, and consequently the 
eigenvectors of T(w,z) do not depend on w. With W(z) we denote the ground 
state, i.e. the eigenvector with eigenvalue |. In the regime where all transfer matrix 
elements are non-negative this is the largest (Perron-Frobenius) eigenvalue. Because 
the transfer matrix is a rational function of the variables z;, also Y(z) is rational, 
and with suitable normalization polynomial. As shown in [9] W(z) satisfies 


Ri (Zi, Zit) WZ, +s is Zit, + ZL) = W(Z1,---, 24s Zine ZL), (A) 


where the operator 


1+ ——_, &i (5) 


i.e. connecting the partners of position i and i + 1, and creating an arc connecting 
i and i + 1| themselves. These equations (4) are called quantum Knizhnik- 
Zamolodchikov (qKZ) equations as they are analogous to g-deformed versions of 
the Knizhnik-Zamolodchikov equations [10] on correlation functions in conformal 
field theory. 

We write the ground state vector 


W (2) = D> Walz) a (6) 


where q@ is a link pattern, and the sum is over all link patterns of a given size (i.e. 
circumference of the cylinder). For the weights yy, of link patterns a in which the 
positions i andi + 1 are not connected by a small (minimal) arc, Eq. (4) leads to 


(azit1 — Q7'2i) Wal. +s Zis Zits) = (Qi — @ tis) Wal... ds Zi ++) 
(7) 


For the polynomial solution this implies that Wo(..., Zi, Zi41,---) must contain the 
factor (qzj — qa zis 1), and is otherwise symmetric for interchange of z; and z;+1. 
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If this argument is used recursively on a link pattern not containing a minimal arc in 
a sequence of edges {k,..., n}, it must contain the factor 


n—1 n 
I] [] @i-a a) 


i=k j=i+l 


and be otherwise symmetric in the variables {zx, ..., Zn}. For the most nested link 
pattern, jz, (again for L = 7 and L = 8) 


in which the small arc not containing the puncture connects the position | and L, 
contains the factor ee Nears (qzj — q~'z;). Because all other weights can be 
derived from this one with the qKZ equations (4), from which functions symmetric 
in z; and z;+1 can be factored out, the weight of the most nested link pattern ju is in 


fact given by 


L-1 


L 
w@=[[ [] @i-a'a) (8) 


i=) jst) 


with no further factors symmetric in z. Clearly yf, (z) is homogeneous and of joint 
degree L(L — 1)/2. As polynomial of a single z; it is of degree L — 1. These 
properties transcend to all yy, as they are conserved by Eq. (4). 


3 Recursions in System Size 


Reference [9] also shows that (4) implies a recursion relation between the ground 
states of systems different in size by 2. For this it is useful to introduce operators 
that mediate between link patterns of different sizes. The operator o; introduces two 
additional positions between positions i — 1 and i, and a small arc that connects 
them. Conversely, the operator t; connects (the partners of) i — 1 and i, and then 
removes the positions themselves. Thus, the operators o; acting on a link patterns 
of size L results in a link pattern of size L + 2, and 1; results in link pattern of size 
L — 2. These operators satisfy 


Oj Tj = ej and Ti Oj = 1 (9) 
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and acting on the ground state vectors gives 


2 
Wojal. +s Zi-1 2G, 2 Eicon) = 
L 


Wal....%-17,--)@'-gz[[-@-ay 0) 


i= 
and 


2 
tT W(..., 2-1, 2°, 295 2i5--.) = 


L 
VWotitiunG —gel[[=G=ar ay 


i=1 


These relations for ground state elements can be read as recursion relations in the 
system size, as the LHS refers to a system of size L + 2, and the RHS has size L. 
We call these relation the fusion recursion relations. 

Later Di Francesco et al. [11] introduced another recursion in the system size, 
not by fixing the ratio between two variables, but by sending one to zero. Since we 
extend their results we treat this in some more detail. When one of the variables z; 
is zero, the Boltzmann weights in the corresponding column are from Eq. (2) 


R(w, 0) = —4 Ane ~ (12) 


We wish to relate a system with size L to a system with size L + 1, with the same 
variables, and one additional variable equal to zero. We will refer to this recursion 
relation as the braid recursion relation, as the R-operator reduces to a so-called braid 
operator that satisfies the Reidemeister moves of the braid group. 

Consider a path in the size-L system, that crosses the location of the rapidity-zero 
column, wanders around and crosses back on a face adjacent to the other crossing. 
In the size-(Z + 1) system, we consider the same configuration, combined with all 
possible configurations in the rapidity-zero column. For the weight of these two 
cases we get the following equation: 


VY 

- 
Cx 
(13) 
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The green line is the line with variable 0, and the red curves are the paths, while 
the two faces where the crossing occurs are indicated with a black box. We see 
that in the system of size L + 1, there are only two possible connectivities in 
four configurations. In one case, with weight 1, the path continues as in the size- 
L system, but avoiding the D-tour, while the paths entering the two boxes vertically 
are connected along the D-tour that is avoided by the original path. In the other 
cases, the branches of the path connect up or down to the vertical. The total weight 
of the last connectivity is 1 + g? + q~? = 0, so it can be disregarded. 

If this applies to paths which cross in two adjacent faces, the same must apply to 
paths which cross the zero rapidity line at more distant faces, as all the crossings in 
between these two faces, form a nested set of double crossings. 

In conclusion one can say that any path that crosses the rapidity-zero column 
once and back, has the same connectivity between the systems of size L and that of 
L+1. 

When L is odd, the system has one path (the defect path) that is not closed, but 
runs along the infinite cylinder from one end to the other. The system with size L+1 
then does not contain such path, so when the system is extended with a rapidity zero, 
the defect path must join up with the additional rapidity zero column. However, this 
can be done in two ways, as the puncture can be placed on either side of the path. 
Figure | shows the two possibilities. Since there is no way to exclude one, we accept 
both, with weights b if the puncture is to the right of the juncture, and c if it is to 
the left as indicated in the figure. The result is that the puncture is “inside” the path 
with weight b and “outside” with weight c. 

The defect line can intersect the zero rapidity column any number of times, even 
or odd. Therefore an additional crossing should not make any difference in the 
weights. Figure 2 shows the configurations if the defect path crosses one more time 
before it joins up with the zero-rapidity line. Now we see that the puncture is inside 
with weight —cq~! and outside with weight —bq — cq — bq~!. This is consistent 
only when both 


cqi'=-b and bqt+cqtbq'!=-c (14) 


Fig. 1 The puncture relative 
to the path formed by the 
unmatched path and the zero 
rapidity line. The defect path 
is shown as red, the zero 
rapidity line is indicated by 
green, and the puncture is 
black b Cc 
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Fig. 2. The configurations formed by placing the puncture on either side of the path and uncrossing 
the intersection between the path and the rapidity-zero line in the two possible ways. As in Fig. | 
the green line represents the (unresolved) zero-rapidity line, and the red curves represent paths 


These (overdetermined) equations are solved by 
b=(-q) "and c= (-q)'/* (15) 


where we chose c = b*. If the defect path is connected to the zero-rapidity 
line in the opposite direction, the same solution is found. In conclusion, since an 
intersection of the defect line with the zero-rapidity line does not affect the weight 
of a configuration, we may argue without loss of generality as if the defect line never 
crosses the zero-rapidity line. 


4 Recursion Relations for the LWCO 


In the previous section we showed that the qKZ equations induce recursion relations 
in the system size for the elements of the ground state vector, and for the probability 
of certain events. In this section we will present how these recursions lead to 
recursion relations for the LWCO in the inhomogeneous TL model on an infinite 
cylinder. First we remind the reader that the elements of the ground state vector, that 
is the relative configurational weight of a half-infinite cylinder, are polynomials of 
degree L—1 in each of the rapidities. Thus the configurational weights of the infinite 
cylinder, made up of two half-infinite cylinder has degree 2(Z — 1). Therefore the 
LWCO is a rational function of degree 2(Z — 1) for both the numerator and denom- 
inator. We can determine these polynomials completely, from knowing their value 
for 2L — 1 values of one of the variables. Let z; to be the variable of choice. We can 
apply the fusion recursion relation when z; = q te orgy = q Zi+1, and the braid 
recursion relation when z; = 0. However, the qKZ equations (4) ensure that both 
the numerator and denominator are symmetric functions of the variables. Therefore 
we can use the fusion recursion relation for z; = q~! zj Orz; = z; forany j Fi. 
Together with the braid recursion relation this gives precisely enough values. 

To establish some notation, let us use ®(w, z) for the (one-point function of the) 
LWCO, with altered loop weight w, and ®,(w, z) and ®g(z) for its polynomial 
numerator and denominator: 


P,(w, Z) 
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Because ®q(z) = ®y (1, z) it suffices to study },. To denote the recursions, we will 
use z for the list {z,, z2,..., zz} as before, (z | z; — v) to indicate that z; takes a 
specific value v, and (z \ i) for the list z from which z; is omitted, and similarly 
(z \ i, 7) from which both z; and z; are omitted. The system size is implicit as the 
length of the last argument. We choose z to have length L always, so that e.g. (z \ i) 
has length (Z — 1). 

The weight of a specific combination of link patterns in the two half-infinite 
cylinders, is the product of two ground state elements. But the dependence on the 
variables is different, as the order of one is reversed relative to the other. The list of 
variable in reversed order is denoted as pz. For example the sum of weights of all 
link patterns for both halves of the cylinder, should be equal to g(z), so 


2 
Paz) = D> val) vp (02) = (r Vo «) (17) 
a, B a 


The order of the variables is immaterial because the qKZ equations ensure that the 
sum of all elements of the ground state vector is a symmetric function. 


4.1 The Fusion Recursion Relation 


From Eggs. (10) and (11) it is clear that 


L 
@a(z| zi > gzj) = Salz\i, ) (-3¢) G [] -@u'ej - ee", (18) 
k#i,j 
and 
L 
Da(z| zi > qo 25) = Palz\i, f) (-39™) G [] -@zi - ee. (19) 
k#i,j 


In order to see what the analogous recursion is for ®,, we first use the symmetry 
of ®, for permutation of the z to place z; and z; adjacent. As shown in [9], the 
corresponding double column simply connects the paths on the left to those on the 
right in the same row. This implies that the topology of the paths in the system of 
size L and that of size L — 2 is the same. As long as the LWCO is not inserted 
between the two adjacent columns carrying the variables z; and z;, the number of 
loops surrounding the point of insertion is the same in the two models. This implies 
that ®, satisfies precisely the same fusion recursion relations (18) and (19) as Dg, 
irrespective of the value of the altered loop weight. Clearly, the difference between 
the two functions ®, and ®g must come from the braid recursion relation. 
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4.2 The Braid Recursion Relation 


We consider an inhomogeneous TL model on a cylinder with perimeter L — 1 and 
another one on a cylinder with perimeter L, with the same variables supplemented 
with one variable equal to zero. In this we consider a contractible loop in the size- 
(L — 1) system, that intersects the position of the zero rapidity column in two 
adjacent faces, and study how this configuration is resolved in the size-L system. 
This is shown in Fig. 3. We observe that if the LWCO is not inserted in the original 
loop, the weight in the size-L system is equal to that in the size-(Z — 1) system, 
multiplied with the sum of the four weights, that is 1 + 1 + g* +q7? = 1. If the 
LWCO is inserted in the original loop, to the left of the zero-rapidity line, then it sits 
in a loop in the upper-right figure, and not in the remaining configurations, whose 
weights add up as 1+q?+q~? = 0. In the surviving (upper-right) configuration, the 
two ends of the zero rapidity line are connected by a path. Likewise if the operator 
is inserted inside the loop, to the right of the zero rapidity line, it sits in a loop in 
the lower-left figure, and again not in the remaining figures. As far as this type of 
configuration is concerned, the weight of the LWCO is the same between the system 
of size L and that of size (L — 1). 

When a contractible loop intersects the zero-rapidity line in two arbitrary faces, 
other loops inside it intersect the zero-rapidity line in a nested fashion, in which 
the loops deepest in the nesting cut the zero-rapidity line in adjacent faces. We can 
resolve this recursively, starting with the loops deepest in the nest, and then the loops 
enclosing them, and so on. We conclude that in a system of size L, in which one of 


1 Gg 


Fig. 3. A closed, contractible loop that cuts the rapidity-zero line in consecutive faces. The path as 
resolved in the size-L system is drawn red, and the continuation of the zero-rapidity line is shown 
in green. The total weight of the two faces is given for each configuration 
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the variables is zero, the relative weight of configurations with any given number of 
loops surrounding an operator insertion is completely the same as in the system of 
size (L — 1) with only the (Z — 1) non-zero rapidities. This suggests that also the 
braid recursion relation is the same for ®, as for g. However, this is not the case. 
Since the braid recursion relation relates even and odd sized systems, a defect line 
may appear or disappear or turn into a loop. 

In Eq. (1) the determinantal expression is divided by (a + 1/a) for odd L. So 
in the determinant itself, while the loops that surround the operator insertion have 
weight (a? + a~?), the defect line has weight (a + 1/a). It is convenient to take 
this operator to replace the original LWCO: in the numerator the surrounding loops 
have weight (a + a~7), and the defect line has co-varying weight (a + 1/a). In 
analogy, in the denominator all loops have weight 1, but the defect line has weight 
/3. So, from now on, we multiply the LWCO in odd-L systems by the simple 
factor (a + 1/a)/+/3, and rename the resulting one-point function @ and similarly 
its numerator and denominator ®, and ®y. The results for the fusion recursion 
relations are not altered, as the factors appear on the RHS and LHS of the recursion 
equally. 

First we will consider a system of odd size L, and send one of its rapidities to 
zero. We have already seen that for a single insertion of the LWCO, all the loops in 
the corresponding system of size (L— 1) that surround it, translate into a surrounding 
loop in the system of size L, with precisely the same weight. What changes is that 
in all configurations in the larger system a defect line appears, of which the weight 
is always (a + 1/a). Thus we find for odd L 


@,(a +a~?, z|z > 0) = (@t+a_!) O(a? +a, z\) RQ), (20) 


where the function F;(z) is a symmetric function of (z\i), which we do not need, 
as it does not depend on a. 

Second we consider a system of even size L, and send one of its rapidities to 
zero. Now the larger system has a defect line, and it has weight (a + a7!) in all 
configurations. We have seen above that the zero-rapidity line effectively creates a 
new path. The defect path in the system of size (L — 1) joins up with this new path 
to form a loop. While it does this the puncture for either end of the cylinder is placed 
to the right or left of the juncture with weight b or c, respectively, as illustrated in 
Fig. 4. The defect line and the zero-rapidity line form a loop. If this loop separates 
the two punctures, it must wind the cylinder. This happens with weight c? + b? = 
—q —q7' = 1. The loop surrounds the operator insertion if it separates this from 
both punctures. Irrespective of where the insertion is, it is surrounded with weight 
1 and not surrounded with weight 1. Thus, in total the insertion is surrounded with 
weight 1, and not surrounded with weight 2. Thus we see that the weight (a + a7!) 
for the defect line in the size-(Z — 1) system is replaced by (a2 +a~”) with weight 1, 
and by | with weight 2. In total the weight (a+ a7!) in the small system is replaced 
by (a? +a~? +2) = (a+a7!)? in the big system. Thus also for even L we find 


@,(@ +a 7, z|z; 3 0) = (@ +a") O,@ +a, z\)KG@) (21) 
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Fig. 4 The possible configurations of the puncture relative to the closed path in the width-(L + 1) 
system made up of the unmatched path in the width-L system and the zero-rapidity line, with 
respective weights (top) and current factor (bottom) 


just as for odd L. With @q(z) = ®,(1,z) and ®(w,z) = Py(w, Z)/Pq(z) this 
completes the braid recursion relation for the LWCO. 


5 The Inhomogeneous Expression for LWCO 


Now that we have found recursion relations that should be satisfied by the LWCO 
of the inhomogeneous TL model, we have the means to prove an expression if we 
have it. However, until now only the homogeneous limit was known. In the Matrix 
workshop on Statistical Mechanics, Combinatorics and Conformal Field Theory in 
2017, Christian Hagendorf pointed us to his publication [8] in which an expression 
appears equivalent to (1). And it also gives an more general expression of which 
(1) is the homogeneous limit. Since we had calculated explicit expressions for the 
inhomogeneous LWCO for small systems (L < 9), it was not difficult to verify 
that indeed his expression is what we need. To make this explicit, we introduce the 
shorthand [x] = (x — x~!), and we propose that 


(qzi — q7!zj)(qzj — qo zi 


Zi) 
1/2 | ee 


®, (a> + a, Z)= 
l<i<j<L E fj j “i 


L 
L =] 
iff 1/2 es a es ee 
IT [ai J ae, 72,172] * [12,1 aa 
. GZ; 2; qe 5% 


That this expression does indeed satisfy the recursion relations we have derived, can 
be shown by explicit row and column manipulations of the matrix after specification 
of the last variable zz — Oorzz > gzj orzr, > q ‘2; for any j ~ L respectively. 
As mentioned before this gives 2 — 1 values for a polynomial that has degree 2L —2 


tga! 


Loop-Weight Changing Operator 543 


in zy and thus proves the inhomogeneous expression for the LWCO. The fact that (1) 
is the homogeneous limit is shown in [8], which proves the expression conjectures 
in [7]. 


Acknowledgements We thank Christian Hagendorf for his important contribution to this result, 
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A Note on Optimal Double Spending M®) 
Attacks Greet 


Juri Hinz and Peter Taylor 


Abstract In the present note we address the important problem of stability of 
blockchain systems. The so-called “double-spending attacks” (attempts to spend 
digital funds more than once) have been analyzed by several authors. We re-state 
these questions under more realistic assumptions than previously discussed and 
show that they can be formulated as an optimal stopping problem. 


1 Introduction 


In recent years, novel concepts originating from the blockchain idea have gained 
popularity. Their rapidly emerging software realizations are based on a mixture of 
traditional techniques (peer-to-peer networking, data encryption) and more modern 
concepts (consensus protocols). Digital currencies represent assets of these systems, 
their transactions are written and kept in an electronic ledger as part of the operation 
of the blockchain system. The main difference from a traditional financial system is 
that crypto-currencies are not issued and supervised by a central authority, but are 
maintained by joint efforts of a network consisting of independent computers (all 
running the same/similar software). Such a network searches for consensus which 
yields a common version of the ledger shared by all participants. The consensus 
is reached by means of a process, which is called mining and is usually backed 
by economic incentives. Blockchain systems are believed to achieve the same level 
of certainty and security as those governed by a central authority but do that at 
significantly lower costs. Furthermore, because of the distributed, decentralized, and 
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homogeneous architecture of the network, a blockchain system can reach a very high 
level of stability due to data redundancy and hard/software replicability. 

Following the mining process, all network participants append, validate, and 
mutually agree on a common version of the data history, which is usually referred 
to as the blockchain ledger. Although the invention of mining is considered to 
be a real break-through which solves the long-standing consensus problem in 
computer science, there is criticism of this approach. The problem is that to reach 
consensus, real physical resources/efforts must be spent or at least allocated. For 
instance, the traditional Bitcoin protocol requires participants to solve cryptographic 
puzzles with real consumption of computing power and energy. This process is 
referred to as the so-called proof of work. Other blockchain systems avoid resource 
consumption and require temporary allocation of diverse resources, for instance the 
ownership of the underlying digital assets (proof of stake) or their spending (proof of 
burn). Furthermore, commitment of storage capacity (proof of storage) or a diverse 
combination of resource allocation/consumption can also be used. 

Let us briefly elaborate on the proof of work; more details can be found in the 
excellent book by “Mastering Bitcoin” [1] by Andreas Antonopoulos. We focus on 
the Bitcoin protocol which was been initiated by Nakomoto [4], with a refinement 
on the double-spending problem in [5] and later in [3] with further considerations 
addressing propagation delay in [2]. In this framework, the ledger consists of a 
chain of blocks and each block contains valid transactions. The nodes compete to 
add a new block to the chain, and while doing so, each node attempts to collect 
transactions and solve a cryptographic puzzle. Once this puzzle is solved, it is made 
public to other nodes. This protocol also prescribes that if a peer node reports a 
completed block, then it must be verified, and if this block is valid, it must be 
attached to the chain, all uncompleted blocks shall be abandoned and a new block 
continuing the chain must be started. However, even following these rules, the chain 
forks regularly, which results in different nodes working on different branches. In 
order to reach a consensus in such cases, the protocol prescribes that a branch with 
shorter length must be abandoned as soon as a longer branch becomes known. 


2 The Double-Spending Problem 


Now we return to the resilience of the protocol to attacks. Note that within a 
blockchain system, the nodes are running publicly available open-source software 
(for mining) which can easily be modified by any private user to control the 
computer nodes in order to undermine the system. In principle, there are many 
ways of doing this. One of the most obvious among malicious strategies would be 
an attempt to spend the electronic money more than once. The analysis of such a 
strategy is referred to as the double-spending problem. 

In the classical [4, 5] formulation of this problem a merchant waits for n € No 
confirming blocks after a payment by a buyer, before providing a product or service. 
While the network is mining these n blocks, the attacker tries to build his/her own 
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secret branch containing a version of the history in which this payment is not 
made. The idea is to not include the paying transaction in the private secret branch 
whose length will overtake the official branch to be then published. If this strategy 
succeeds, then the private secret branch becomes official and the payment disappears 
in the ledger after the product/service is taken by the attacker. Nakomoto [4] 
provides and Rosenfeld [5] refines an estimate of the attacker’s success probability 
depending on his/her computational power and the number n of confirming blocks. 

Let us briefly discuss their result before we elaborate on further details. In 
the framework of double-spending problem, it is assumed that a continuous-time 
Markov chain taking values in Z describes the difference in blocks between the offi- 
cial and secret branches. As in [5], we consider this process at time points at which a 
new block in one of the branches is completed, which yields a discrete-time Markov 
chain (Z;)?°,. Having started secret mining after the block including the attacker’s 
payment (at block time tf = 0, Zo = O) the attacker considers the following 
situation: At each time t = 1,2,3..., a new block in one of the branches (official 
or secret) is found, the block difference changes by +1 with probabilities (see [2, 5]) 


P(Z; =z+1|Z;-1 =z) =1-4q 
P(Z, =z—-1|Z;-1=2) =4. 


where qg €]0, 1[ is the ratio of the computational power controlled by the attacker 
to the total mining capacity. Consider a realistic case where the attacker controls 
a smaller part of the mining power 0 < g < 1/2 than that controlled by honest 
miners. In this case, if at any block time tf = 0,1,2,... the block difference is 
z € Z, then the probability a(z, oo) that the secret branch overtakes the official 
branch within unlimited time after ¢ is given by 


1 ifz <0 


Ca ale otherwise. (2-1) 


CO 
Ago (Z,q) = Pan Zu <0|Zo=z)= | 
u= 


Furthermore, at the time when the n-th block in the official branch is mined, the 
probability that the attacker has mined m = 0, 1, 2, ... blocks follows the negative 
binomial distribution whose distribution function is given by 


k 
n+m-—1 
Funk) = Yo ( i Ja-ara, S012 (2.2) 


m=0 


Both results (2.1) and (2.2) are combined in [5] to obtain the success probability of 
the double-spending as follows: Consider the situation where at the time the n-th 
block in the official chain is completed, the attacker has mined m > n blocks which 
can be published immediately. The probability of this event is given by 


= n+m—1 ie. "\(ntm-1 fwis.. 
y ( a Ja-org"=1- 9 ( - Ja ata" =1 = Fant. 


m=n+1 m=0 
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Next, consider the opposite event, assuming that when the n-th official block is 
completed, the attacker has not overtaken the official chain in which case m < n. In 
this case, the probability of winning the race is given by 


: = n 1 
Sane Ja arama may = (rm Ja-ara 
on m 


m=0 Pet 
= Fg nl). 
= 
Clearly, the success probability is given by 
q 
1— Fg.n (n) + i-_ (2.3) 


For instance, if the merchant waits for six confirming blocks the attack succeeds 
with probabilities 


0.00037 % forg = 6%, andwith 0.0025% forg = 8%. 


As a result, waiting for six blocks after the payment has been considered as secure 
in the sense that with realistic efforts it is practically impossible to succeed with 
double spending. 


Remark Note that in the original work [5] it was assumed that the attacker can start 
the race having pre-mined one block. This leads to a different success probability 


1 — Fon(n—1) + Fi-gn( — 1). (2.4) 


While the difference between (2.4) and (2.3) can be significant (see Fig. 1), it is not 
clear how to achieve an advantage of being able to start the race with one block 
ahead of the official chain. The present note is devoted to this interesting question. 


3 
oT D 2 
ze] |) = 4 
ae ’ Peo 
33. 5s 
2 | 4 al 
A 3 cf 8 te) 
8 3] ; BY 4 
gS | ee a 
Do ie D 
o: _| ec 2 2 
6 T T T T T Poo T T T T T 
© 0.00 0.02 0.04 0.06 0.08 0.10 0.00 0.02 0.04 0.06 0.08 0.10 
ratio q ratio q 


Fig. 1 The success probability (and its logarithm) of the double spending attack forn = 6 
confirming blocks depending on the mining ratio g ¢€ [0, ail calculated by (2.3) (solid line) 
versus (2.4) (dashed line) 
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The above analysis [5] calculates the probability of the alternative blockchain 
getting ahead of the official one. It doesn’t consider revenues and losses from a suc- 
cessful/failed attack. Furthermore, the possibility of canceling the secret mining (if 
the block difference becomes too high) is not considered. Most important, however, 
is the question why the paying transaction must be placed right after fork-off. Note 
that this assumption is justifiable only if the merchant requires immediate payment 
after the purchase is agreed upon, otherwise canceling the deal. However, in reality, 
the attacker may be able to freely choose the time of payment, in particular when 
buying goods from web portals. That is, an attempt to overtake the official chain 
before launching an attack can give an advantage in the spirit of the above remark. 


3 A Refinement of the Double-Spending Problem 


Let us consider an alternative situation. Assume that the attacker can freely choose 
the time of payment. Doing so, he/she can start working on a private secret branch 
long before the payment is placed. For such situations, the analysis of the double- 
spending is different and requires solving (multiple) stopping problems. 

Consider a finite time horizon where t € {0,..., 7} represents the number of 
blocks mined in the official chain since the branch has forked off. That is, we 
suppose that our secret mining starts at the block time t = 0. We interpret T € No as 
the maximal length of the official branch, which can be abandoned if a longer branch 
has been discovered. To the best of our knowledge, the current Bitcoin protocol 
does not have such a restriction, meaning that the shorter branch must always be 
discarded, independently of its length . However, other blockchain systems discuss 
“checkpoints” and “gates” with a similar functionality. A finite time horizon yields 
conceptual advantages and presents a negligible deviation from the reality since T 
can be sufficiently large. 

Recall the process (Z;)?°,, describing the branch length difference at times the 
next block in one of the branches has been mined. Now, consider another process 
(X. treo where X; stands for the branch length difference between the official and 
secret branches at the times ¢ = 0, 1 ..., when one new block in the official branch 
is completed. In turns out that (X a follows a Markov chain, whose transition 
from state X; = x to X;4; = x + 1 — y describes the event that while one 
official block has been mined, y = 0,1, 2,... secret blocks have been obtained. 
The transition probabilities of (X ‘rae are given for allt = 0,1..., by 


Gtk), k € No, 


i 
0, kEZ\No, wh 


PX set i-kiX=a={ 


in terms of the geometric distribution 


G(k) =(1—q)g* fork €No={0,1,2,...,}. (3.2) 
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Suppose that the secret branch contains the invalidation of the paying transaction. 
Recall this can be reached by a simple non-inclusion of the attacker’s paying 
transaction. If the attack is launched at a block time t = {0,..., 7}, then the 
payment will be included into block t + 1 of the official branch. In this context, the 
crucial question is whether to attack or not and how to choose the time t = 0,..., T 
optimally. It turns out that under specific assumptions, this question can be treated 
as an optimal stopping problem, which we formulate next. 

According to our modeling with a finite time horizon, we agree that fort > T—n 
a successful attack is not possible. Namely, since the payment is placed into block 
t + 1 andn confirming blocks are expected, the last confirmation block t +n > T 
would be beyond the maximal branch length which can be abandoned. That is, we 
can assume that the time t must be chosen within the finite horizon t = 0,..., T 
with the last time point T = T —n. The decision whether to attack must be based 
on the current block time t = 0,..., T and on the recent block difference X t- In 
order to optimize the time t = 0, 1,..., T. we define the success event S(t) for the 
attack launched at t as 


T+1 i 
S(t) ={ min aoe t=O, Sf: 


i=t+n+ 


Hence the expected reward of the attack is 


R(x) = (C1 scr) _ clsrj< |X; =x) 


(C+c)P(S(t)| Xr =x)-—c, t=0,...,T7, xEZ (3.3) 


where the numbers C > 0 andc > O represent the revenue and loss resulting 
from the success or failure of the attack. Let us agree that t = +00 stands for the 
attacker’s option to not attack, which can be optimal if the chance of overtaking 
the official branch is too low. In order to model such an opportunity, we extend the 
reward function (3.3) for the time argument t = oo as 


Ro(x)=0, x eZ. (3.4) 


In this context, the choice of the optimal payment time t* yields an optimal stopping 
problem of the following type: 


determine a maximizer t* to. ZY > R, tt> E(R,(X,)) where 
TF denotes all {0,..., 7} U {+00}-valued stopping times. 


(3.5) 
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4 Conclusion 


Having assumed that the payment moment can be chosen by the attacker, the 
solution to the double spending problem consists of secret mining, followed by a 
later payment. The optimal payment time is determined by the current numbers of 
blocks in both chains since their fork off. The success probability of such an attack is 
dependent on the required confirmation block number n and the revenue/loss C > 0, 
c > 0 caused by the success/failure of the attack. This contribution shows that 
the optimization of the attack under these assumptions requires solving an optimal 
stopping problem. The authors will address these problems in future research. 


Acknowledgement The first author expresses his warmest gratitudes to Kostya Borovkov for 
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Stochastic Maximum Principle ®) 


on a Continuous-Time Behavioral Cheek for 
Portfolio Model 


Qizhu Liang and Jie Xiong 


Abstract In this short note, we consider the optimization problem with probability 
distortion when the objective functional involves a running term which is given by 
an S-shaped function. A stochastic maximum principle is presented. 


1 Introduction 


There are several epoch-making achievements in the history of finance theory over 
the past 70 years. The first is the expected utility maximization proposed by von 
Neumann and Morgenstern [17]. It is premised on the tenets that decision makers 
are rational and consistently risk averse under uncertainty. Later on, a Nobel- 
prize-winning work, Markowitz’s mean-variance model [12] came out. Along with 
these theories in continuous portfolio selection problems, many approaches, such 
as dynamic programming, stochastic maximum principle, martingale and convex 
duality have been developed, see Merton [13], Peng [14], Duffie and Epstein [4], 
Yong and Zhou [19], Karatzas et al. [9]. 

On the account of substantial phenomena violating the basic tenets of conven- 
tional financial theory, for instance, Allais paradox [1], Tversky and Kahneman 
[16] put forward cumulative prospect theory (CPT) and Benartzi and Thaler [3] 
proposed behavioral economics. Both of them integrate psychology with finance and 
economics. To study the continuous-time portfolio choice problem, we concentrate 
on CPT in this paper. Its key elements are: (1) benchmark (evaluated at terminal time 
T) serves as a base point to distinguish gains from losses (Without loss of generality, 
it is assumed to be 0 in this paper); (2) Utility functions are concave for gains and 
convex for losses, and steeper for losses than for gains; (3) Probability distortions 
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(or weighting) are nonlinear transformation of the probability measures, which 
overweight small probabilities and underweight moderate and high probabilities. 

There have been burgeoning research merge CPT into portfolio investment. Most 
of them are limited to the discrete-time setting, see for example Benartzi and Thaler 
[2], Shefrin and Statman [15], Levy and Levy [10]. The pioneering analytical 
research on continuous-time asset allocation featuring behavioral criteria is done 
by Jin and Zhou [7]. Since then, a few extensive works have been published, see 
He and Zhou [5, 6], Xu and Zhou [18], Jin and Zhou [8] and so on. Jin and Zhou 
developed a new theory to work out the optimal terminal value in a continuous-time 
CPT model. Nonetheless, their theory aims at a particular portfolio choice problem 
in a self-financing market. 

This article is to deal with probability distortion for model with running utilities. 
In order to come closer to reality, bankruptcy is not allowed in our problem. The 
remainder is organized as follows. Next section will formulate a general continuous- 
time portfolio selection model under the CPT, featuring S-shaped utility functions 
and probability distortions. The stochastic maximum principle as well as a solvable 
example are finally presented. 


2 Problem Formulation 


Let T > 0 be a fixed time horizon and (2, F,P, {.F;}:>0) a filtered complete 
probability space on which is defined a standard .¥,-adapted m-dimensional 
Brownian motion W, = (W;,---,W/")' with Wo = 0. It is assumed that 
F, = o{W, : 0 <8 < t}, augmented by all the null sets. Throughout this paper 
A! denotes the transpose of a matrix A; a* denote the positive and negative parts 
of the real number a. 

We define a positive state process 


fe = b(t, u,, X;)dt + a(t, u;, X1)dW; (2 1) 


Xo =xo > 0, 


and the agent’s prospective functional 


JU) =E fy (Sut yo, (1 — Kye @i)) — 6p )wl (1 F,- (uy ))) at 
+E (l(Xr)w'(1— Fx, (X7))), 


(2.2) 


where u. is a control process taking values in a convex set U C R. According 
to CPT, the following assumptions will be in force throughout this paper, where x 
denotes the state variable, u denotes the control variable. 

We make the following assumptions throughout this article. 


(H.1) b¢,-,-):[0,T] x Ux Rt > R,o(.,-,-) : [0, T] x U x R* > R are 
continuously differentiable with respect to (u, x) with Lipschitz continuous first 
derivatives. We further assume b(t, u,0) = o(t, u, 0) = 0. 
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(H.2)) ¢4(-),1¢) : Rt = Re are differentiable, strictly increasing, strictly 
concave, with ¢4(0) = 1(0) = 0 and ¢/. (0+) = I'(0+) = 

(7.3) w(-),w-) : [0,1] — [0,1], are differentiable and strictly increasing, 
with w+(0) = w(0) = 0, w4 (1) = wC1) = 1. Moreover, the first derivatives of 
w+(-), w(-) are all bounded. 


Let 


T 
U= ful [0, T] x 2 > U | u; is F;-adapted and | |u,|*dt < co}. 
0 


Definition 1 A control process u. € WY is said to be admissible, and (u., X.) is 
called an admissible pair, if 


1. X. is the unique solution of Eq. (2.1) under w.; 
2. For any t € [0, 7], the distribution functions of u* are continuous except at 0; 


3. Ef |te(utyor(1 — F,z(u#))|*dt < 00. 
pt d +) |8 ne ty |4 
4. E fo (gp incu | + [eL@ur)| at < 00. 


The set of all admissible controls is denoted by Yq. 


Meanwhile, the following technical assumption for the terminal state are in force 
throughout this paper. 


Assumption 1 The terminal state Xr corresponding to the control process u. € 
Waa is supposed to has continuous distribution function. Besides, 


d 8 
3|1(X7)w' (1 — Fx, (Xr))|° + 5|—— nl(X1)| +E|I"(X7)|* <00. (2.3) 


Problem Our optimal control problem is to find #7. € Wa such that 
J(u.) = max J(u.). (2.4) 


u.€Uad 


3 A Necessary Condition for Optimality 


The current section presents our main result of the article. Let (#., X.) be an optimal 
pair of the problem (2.4). We proceed to presenting the condition it must satisfy. To 
this end, we formulate the adjoint equation 


dp; = — (bx (t, itr, X1) pr + ont, itr, X1)qr)dt + qrd W,, 


Al 
pr =l(Xr)w!(1 — Fy, (Xn). ai 


Here is the necessary condition we obtained for the optimality of the control. 
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Theorem 1 /f i. is the optimal control with the state trajectory X., then there exists 
a pair (p., q.) of adapted processes which satisfies (3.1) such that a.e. t € [0, T], 


—t1, (i Jo (1— Fax @)) ifiis > 0, 


—¢! (ii; eo! (1— Fj (iz) ify <0, 
(3.2) 


DPibu(t, ur, Xy) + ou(t, te, X)aqr = 


Recall the state equation (2.1) and the adjoint equation (3.1). Given an optimal 
control iz., there exists a unique solution X (iz.) to the state equation. As pr is known, 
the unique solution (p(u.), g(u.)) for the backward SDE (3.1) is obtained. Plugging 
X(i.) and (p(u.), g(u.)) into (3.2), the optimal control v. is narrowed to one of the 
solution of so obtained algebraic equation. 

In what follows, we present a solvable example and compare the result with the 
one without probability distortions. The process u; in the objective functional are 
replaced by u; X;, signifying the proportion of wealth process. We study a case with 
compounded cost function. 


Example I Let u;, X; > 0, b(t, u,x) = —ux, and o(t,u, x) = x. We take utility 
function ¢4 (x) = —(0 <a < 1), and distortion function (see, Lopes [11]) 


w+(p) = vp’! +(—vfl1-d— pe], =», p>0,0<v<1. 
Then, 
dX; = —u;X;dt + X;dW,, Xo = Xo, 


and 


T 
1 
J(u.) => :f (<u Xy*o4 (1 = Fu,X; (u;X1)) + X,)dr. 
0 
By Theorem 1, its optimal solution (#., X.) should satisfy 
pr = (ir X,)* oh (1 — Fy, ¢,@-X,)), aet €[0,T],a.s., (3.3) 
and 


dp; = (ir Pr at (ii, X,)" or}, (1 ~ Fi, (ui, X;)) itr = I)dt+qdW,, pr =0. 
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It yields pp = T —t, q = 0, Vt € [0, T]. Plugging back to equality (3.3), 


- 1/(a—1) 
we obtain that u,X; = (ain) ,aet € [0, T], a.s. Solving the state 
equation, we arrive at 
t(T — 5)'/@-) 
is = (T —1)'/0-) /¥, (xo —v) (B+ D/O + i als as. 
0 Ss 


where V; = exp{B; — 5}. 
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Number of Claims and Ruin Time ®) 
for a Refracted Risk Process Ghost for 


Yanhong Li, Zbigniew Palmowski, Chunming Zhao, and Chunsheng Zhang 


Abstract In this paper, we consider a classical risk model refracted at given level. 
We give an explicit expression for the joint density of the ruin time and the cumula- 
tive number of claims counted up to ruin time. The proof is based on solving some 
integro-differential equations and employing the Lagrange’s Expansion Theorem. 


1 Introduction 


Between 20.11.2017 and 8.12.2017 an international research institute MATRIX 
in Creswick, Australia, run research program Mathematics of Risk during which 
four 5-h workshops were given. In particular, Z. Palmowski presented a workshop 
entitled Ruin probabilities: exact and asymptotic results. This paper is closely 
related with the topics introduced during his lectures. 

The joint density of the ruin time and the numbers of claims counted until ruin 
time has been already studied for a classical risk process over last years. Dickson [3] 
derived special expression for it using probabilistic arguments. Landriault et al. [11] 
analyzed this object for the Sparre Andersen risk model with the exponential claims. 
Later Frostig et al. [6] generalized it to the case of a renewal risk model with the 
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phase-type claims and inter-arrival times. The main tool used there was the duality 
between the risk model and a workload of a single server queueing model. Zhao 
and Zhang [20] considered a delayed renewal risk model, where the claim size is 
Erlang(n) distributed and the inter-arrival time is assumed to be infinitely divisible. 

Our goal is to derive expression for the joint density of the ruin time and the 
numbers of claims counted until ruin time for a refracted classical risk process (see 
Kyrianou and Loeffen [9] for a formal definition). It is also called a compound 
Poisson risk model under a threshold strategy. The latter process is a classical risk 
process whose dynamic is changed by subtracting off a fixed linear drift whenever 
the cumulative risk process is above a pre-specified level b. This subtracting of the 
linear drift corresponds to the dividend payments and the considered strategy is also 
known as a threshold strategy. Dividend strategies for insurance risk models were 
first proposed by De Finetti [2] to reflect more realistically the surplus cash flows 
in an insurance portfolio. More recently, many kind of risk related quantities under 
threshold dividend strategies have been studied by Lin and Pavlova [16], Zhu and 
Yang [22], Lu and Li [14, 15, 18], Badescu et al. [1], Gao and Yin [7] (see references 
therein). The case when the drift of the refracted process is disappearing (everything 
above threshold b is paid as dividends) is called barrier strategy, see Lin et al. [17], 
Li and Garrido [12], Zhou [21] and in the references therein. 

The paper is organized as follows. In Sect.2 we define the model we deal 
with in this paper. In Sect.3 we recall properties of the translation operator and 
the root of the Lundberg fundamental equation. In particular, we introduce the 
Lagrange’s expansion theorem and some notation. In Sect.4 we construct two 
integro-differential equations identifying the joint Laplace transform of joint density 
of the numbers of claims counted up to ruin time and the ruin time. Analytical 
solutions of these two integro-differential equations are given in Sect.5. Applying 
the Lagrange’s expansion theorem in Sect.6 we give the expression for above 
mentioned density. 


2 Model 


The classical risk process is given by 
U(t)=u+cit — S(t), (1) 


where U (0) = u denotes initial capital, c is the premium rate and S(t) = ae X; 
represents the total amount of claims appeared up to time t > 0. That is, {X;},;enj 
are non-negative i.1.d. random variables with pdf f(x) and cdf F(x) and {N;}{r>0} 
is an independent Poisson process with a parameter A. To take into account dividend 
payments paid when regulated process (after deduction of dividends) is above fixed 
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Fig. 1 Graphical representation of the surplus process Uj (t) 


threshold level b > 0, we consider so-called refracted process given formally for 
c2 < cy by: 


cidt —dS(t), 0 < Up(t) <b 


(2) 
codt — dS(t), Up(t) > b 


dUp(t) = | 


and U,(0) = u. In this case c, — c2 denotes intensity of dividend payments, see 
Fig. 1. 

Throughout this paper, we will assume that cz > 7 £.X,, which means refracted 
process U;(t) tends to infinity almost surely. We can then consider the ruin time: 


t = inf{t > 0, Up(t) < O}, 


(t = oo if ruin does not occur). Note that N; represents the number of claims 
counted until the ruin time. The main goal of this paper is identification of the 
density of (t, N,). We start from analyzing its Laplace transform: 


o(u) = Efr%e*" I(t < 00)|U,(0) = u] (3) 
= a i? e wu, n, t)dt, (4) 
n=1 0 


where 
w(u,n,t) = P(N; =n,t € dt|U,(0) = u)/dt 


is the joint density of (t, Nz) when U;(0) = uw. In above definition we have 6 > 0 
andr € (0, 1]. Later we will use the following notation 


wi(u,n,t) = w(u,n, ft) foru <b 
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and 


w2(u,n,t) = wu, n, t) foru > b. 


3 Preliminaries 


In this section we introduce few facts used further in this paper. We start from 
recalling the translation operator 7;; see Dickson and Hipp [4]. For any integrable 
real-valued function f it is defined as 


T; f(x) = i * e-80- fyyay, REG 


The operator 7; satisfies the following properties: 


1. T; f(O) = bie e ** f(x)dx = f(s) which is the Laplcae transform of f; 


2. The operator 7; is commutative, i.e. T; 7 = T;T;. Moreover, for s ~ r andx > 0 


Ts f(x) — Tf) 


r—s 


TsT, f(x) = TTs f(x) = (5) 
More properties of the translation operator 7, can be found in Li and Garrido [13] 
and Gerber and Shiu [8]. 

For any function g we will denote by g(s) its Laplace Transform, that is g(s) = 
i e **g(x) dx. Next, fori = 1,2 let p; be the positive root of the Lundberg 
fundamental equation 


cis — (A +8) +Arf(s) =0. (6) 


The positive roots always exists for 6 > 0; see Fig. 2. 


Lagrange’s Expansion Theorem In this paper we will also use the Lagrange’s 
Expansion Theorem; see pages 251-326 of Lagrange [10]. Given two functions 
a(z) and 6(z) which are both analytic on and inside a contour D surrounding a 
point a, if r satisfies the inequality 


IrB(z)| < lz —al, (7) 


for every z on the perimeter of D, then z — a — rg(z), as a function of z, has exactly 
one zero 7 in the interior of D, and we further have: 


rk k—-1 


_ = d / k 
a(n) = a(a) + d Tt YOR) na: (8) 


Number of Claims and Ruin Time for a Refracted Risk Process 563 


Fig. 2, Roots for Lundberg’s 
fundamental equation 


Arf(s) 


Finally, we define also the impulse function 


n= {ey ee 


t=x 


with [5° 5:(t)dt = 1. We denote g*, k > 0, with g* = g and g(t) = 60(t) the 
k-fold convolution of g with itself, where 


t 


(g xh)(t) = / g(x)h(t —x)dx, t>0 
0 


for two functions g and A supported on (0, oo). 


4 Integro-Differential Equations for the Joint Laplace 
Transform 


In this section, we derive two integro-differential equations identifying ¢(u) defined 
in (3). We will follow the idea given in Lin and Pavlova [16]. Denote 


pitu), usb, 


d2(u), u> b. O) 


pu) = | 
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Theorem 1 The joint Laplace transform © satisfies the following integro- 
differential equations: 


(w= 91 (u) — «Jo gi —x)f(x)dx — £ Fw), O<u<b 
o(u)= Aas a > by (u — x) f (x)dx +f", gi(u — x) f(x)dx) — Fu), u>b 
(10) 


Oo 


with the boundary condition 


p1(b) = d2(b) := wee p2(u). (11) 


u—>b 


Remark I Note that from the integro-differential equations (10) follows that the 
joint Laplace transform with initial surplus above the barrier depends on the 
respective function with initial surplus below the barrier, but the reverse relationship 
does not hold true. 


Proof Let first 0 < u < b. Then conditioning on the occurrence of the first claim 
we will have two cases: the first claim occurs before the surplus has reached the 
barrier level b or it occurs after reaching this barrier. There are also two other cases 
at the moment of the arrival of the first claim: either the risk process starts all over 
again with new initial surplus or the first claim leads already to ruin. Hence: 


ou) = ilu) 


ou ut+e\t _ 
=f "pre Meno (/ otut+ct—x)f(x)dx+ Flu4 ai) dt 
0 0 


ee b+en(t— Sat) b-u x b-u 
+ Aree . o(b + c2(t — ——) — x) f(x)dx + F(b+ c2(t — )) | dt 
bow 0 Cl : Cl 


b= 
"a b- 
= wr [fe Tp G+8)t "yu teat tar f eM (b+ c(t — ar, 


ra 


(12) 
where y(t) = fj @(t — x) f(x)dx + F(t). 


Changing variables in (12) and rearranging leads to the following equation for 
O<u<b: 


b 
$1 (u) = Se ts)u/er / eo Atdt/c1 y(t)dt iad eAtd)u/cy 
cl u c2 


x — eo AFD (e1—e2)b/e1l/24, (py dt (13) 
b 


Differentiating both sides of (13) with respect to u yields first equation. 
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Similarly, for u > b we have: 
pu) = o2(u) 
loo) ut+cot = 
= / Are eW (| d(u+cot —x)f(x)dx+ Fut o)) dt 
0 0 


[o) 
= ar f e OTD y (y + cot)dt 
0 


Ar (A+d)u/co a —(A+8)t/c2 
= —e e 2y (t)dt. (14) 
c2 u 


Differentiating both sides of (14) with respect to u produces the second equation. 
Note also that from Eqs. (13) and (14) it follows that @(w) is continuous at u = b 
and hence (11) holds. This completes the proof. 


5 The Analytical Expression for ¢(u) 


In this section, we derive the analytical expression for ¢;(u) @ = 1, 2) using the 
translation operator introduced in Sect. 3. 


Theorem 2 The function $2(u) can be expressed analytically as follows: 


fie! 
go(u) =) > (=) (Tp fy *htu—b), u>b, (15) 
n=0 
where 
u+b - 
h(u) := / gi(u+b—x)Tp, f(x)dx + Tp, F(u + Bb). (16) 


Proof We adopt the approach of Willmot and Dickson [19]. Consider the second 
equation in (10) for u > b. Fora fixed s > 0, we multiply both sides of this 
equation by e~*“—?) and integrate it with respect to u from b to oo: 


C2 ia e SU) 6) (u)du 
b 
lee) u—b 
=(A + 8)T,$2(b) — ir f aia go(u — x) f (x)dxdu 
b 0 


lee) b 
— ar f eae | bi(y) f (u — y)dydu — ArT; F (b) 
b 0 
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[oe] [o.@) 
=(A + 8)T,$2(b) — ar if e** F(x) / e SU—-*—P) by (u — x)dudx 
0 x+b 
b lee) _ 
—ar / t10) / eS) #(y — y)dudy — ArT, F(®) 
0 b 


b 
=(A + 5)Ts2(b) — Ar f(s) Tsb2(b) — ir f Pi O)Ts f(b — y)dy — ArT, F(b). 


Integrating by parts gives: 


) : * @7 SUG. (u)due = cosTsb2(b) — cr(b). 
Hence 
c2sTs2(b) — c2$2(b) 
=(A + 8)T.$2(b) — Ar f(s)Ts62(b) — ar i gi(y)Ts f(b — y)dy — ArT, F(b) 


and simple rearranging leads to: 


b 
(cys — (A+) + Ar f (8))T2(b) = €r62(b) — Ar [ b1(y) Ts f(b — y)dy — ArT, F(b). 
(17) 


Taking s = 2 for the solution p2 of the Lundberg Fundamental Equation (6) gives 
b = 
cxtr(b) =r [dT fb = »)dy + 2 Tp, FO). 
0 
Then Eq. (17) is equivalent to: 
[c2(s — pr) + Ar f(s) — ar f(p2)1Ts$2(b) 
b 
=r [ PWT, f(b — y) — Ts f(b — y)ldy + Ar[Tp, F (6) — Ts F(b)). 


Now dividing above equation by s — 2 and using property 2 of the translation 
operator introduced in Sect. 2 produces: 


b 
c2T;$2(b) = iT Tp f OTs a(0)+%¢ | b1(y)Ts Tp f (b— y)dy +ArTsTp, F (b). 
(18) 
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Inverting the translation operators of (18) yields the following renewal equation for 


g2(u): 


x u—b u _ 
g2(u) = = | g2(u — x)T py f (x)dx +/ a (U — X)T py f (x)dx + TF w)| : 
(19) 


Taking y = u — band g(y) = ¢2(y +b) we can rewrite (19) as follows: 


Ar [% Ar 
0o)== [eo -ntafordr+ ny, y>0. 
c2 Jo c2 
where 
h(y) =htu—-b)= / ifm —X)Tp, f (x)dx + Tol i); u>b. 
Hence 
g2(u) = g(y) 
Ar > 


r 
= ae gly —x)Tp, f (x)dx + ar ny) 
c2 JO C2 


n+1 
) (Tp f)"* * h(y) 


ll 
4 
~~ 
oe 


oo n+l 
= > (=) (Tp, f)"* * h(u — b) 
c2 


which completes the proof. 


The expression for ¢j(u) could be also derived in terms of the translation 
operator. 


Theorem 3 The function $ (u) can be expressed analytically in the following form: 


oo * Toy f(b) + Tp F(b)] — b00(b) 
= c2 
b1(u) = doo (u) + me Sea 7 a v 


C2 


(u), (20) 


where 


[o.e) 


ar\"t! - 
dou) = D> (=) (Tp, f)"* * Tp, Fu) (21) 


n=0 


568 Y. Li et al. 


and 


ane oa 
v(x) := > (=) (Tp, f)"* * p(x) (22) 
n=0 : 
with p(x) = e?!*, 
Proof We will follow Landriault et al. [11]. Note that the first equation in (10) does 
not involve the barrier level b: 
A+6 


Cl 


ob; (u) = 


Ar f" Ar = 
pitu)—-— | gitu-—x)f()dx — —Fu). (23) 

C1 JO C1 
The information about the barrier b is included in the boundary condition: 

p1(b) = $2(b) = lim $2(u). 
u—>bt 
Lin et al. [16] showed that the general solution of (23) is of the form 
pil) = Poo(u) + kv(u), (24) 

where ¢o0 (u) is the joint Laplace transform of density of the ruin time and number 


of claims counted up to ruin time for the classical risk process (1) without any barrier 
applied. That is, 


doo (U) := pas [ e woo(u, n, t)dt (25) 
n=1 0 
for 
Woo(u,n,t) = P(N; =n,t € dt|U(0) = u)/dt. (26) 


In above Eq.(24) the quantity k is a constant which we can specify by 
implementing (24) and (19): 


_ BL bo 9 Tm Seda + MFO] ~ deol) os 


We express now the function ¢o5 in terms of a compound geometric distribution. 


Indeed, since $0 also satisfies Eq. (23), taking Laplace transforms of its both sides 
for sufficiently large s gives: 


(cis — (A +8) + Arf (s))bo0(s) = c1Go0(0) — Ar F(s), 8 > 0. (28) 
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To determine the constant term c;¢.0(0) in (28), we substitute the solution o; of the 
Lundberg Fundamental Equation (6) for s: 


C1$00(0) = Ar F(p1) = ArTp, F(0). (29) 


Consequently, the Eq. (28) reduces to 


[c1(s — pr) +r f(s) — Ar f(pr)Idoo(s) = Ar F(p1) — Ar F(s). 


Dividing above equation by s — p; and simple rearranging along with implementa- 
tion of the formula (5) produces: 


€1b00(S8) = ArGoo(s)TsTp, f 0) + ArTsTp, F (0). 


Inverting this Laplace transforms gives classical renewal equation: 
Ar Ar 7 
Poo(u) = Phas * Tp, flu) + au (30) 


having the solution given as an Neumann infinite series (21). 
To prove the last statement (22) note that the function v(u) satisfies the following 
integro-differential equation: 


u 


civ (u) — (A+ 6)v(@u) + ir f vuu—x)f(x)dx =0, u =O, (31) 
0 


with the initial condition v(0) = 1. To get the analytical expression of v(u) we take 
the Laplace transforms of both sides of (31) for sufficiently large s (s > 1). This 
yields: 


cisd(s) — cyv(0) = (A+.8)d(s) — ar f (s)0(s). 
Since v(0) = 1, 


A+6 


cl 


Ar » J 
(s + ao = )v(s) = 1. (32) 


Recalling that ») is the root of (6), we can rewrite (32) as 


Ara A 7 
(s — pi + Pah — fe )Dv(s) = 1, 


Cc 
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which, by dividing by s — p; and implementing (5), produces: 
” Ar . 1 
v(s) = —v(5)TsTp, f (0) + ——. (33) 
C] S— pl 


Inverting the Laplace transforms in (33) leads to the Eq. (22). Including all above 
identities in (24) completes the proof. 


6 The Joint Density of (t, N;) 


In this section we give the joint density of the number of claims counted until ruin 
time and the ruin time using the Lagrange’s Expansion theorem. We start with few 
facts that will be useful in the proof of the main result. 
Recall that by woo(u, n, t) we denote the joint density of (t, N,) for the classical 
risk process (1) (with infinite barrier b = +00); see (26). For i = 1, 2 we denote 
Bi(x, 0,1) = bxjq (Nel, 
gi(x,n,t) = xt" !e* A" f™ (cit — x) /n!. 


Following Dickson [3] we can state the following lemma. 


Lemma 1 We have 
Wo(u, 1, t) = re F(u + cjt). 
Forn = 1,2,3,... the following holds: 


(At)” u+cyt = 
Woe(u,n+1,t) = amen f ft eit —x)AF(x)dx 
nt 0 


n t ; 

xs) . 

-a >) SY £8 Pr eces(in +1—-—j,t—s))ds, 
‘= 10 i! 


(34) 


where 


x Cit _ 
wao(0.n.0) =~ f F(x)giQa4,n—1,t)dx, n=1,2,.... (35) 
Cl JO 
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Proof Using Lagrange’s Expansion Theorem presented in Sect. 2 with a(z) = e~“", 
B(z) = —A f(s), 4 = (4+ 4)/cj and D = {z\|z—a| < a} G@ = 1,2) and the 


Lundberg fundamental equation (6) we can conclude the following identity: 


oO un zn-1 n 
a ae . r” d _ De x 
e fix = e (A+6)x/ci a EL (-xe SX (-=/09) ) | 
pares ds Ci s=(A4+5)/c; 


r? qr- 1 0° 
= pO d)x/c ae (coarse! f ge r"*(idy) 
0 


n! dst! 


s=(A4+5)/cj 


n=1 


qnpn 
= eo At8)x/ci yD mid : i x(x + yy" 1o-O+dQt+y)/ci fl (y)dy. 


n=1 


Substituting ¢ := (x + y)/c; and rearranging leads to: 


oo qn lee) 
e Fix = e ATS) x/ci +4 a rt ie er ma aa (oT _ x)dt 
mM Sxje; 


7 a 3h a De a ade. (36) 


n=0 [ci 


Therefore, 


Tf (x) = / © eA) fu) 


co © oo 
= gi e' gi(u—x,n, t)dtf (u)du 
x n=-0 (u—x)/cj 


cit+x 
_ Se "fre a f()gi(u — x,n, t)dudt. (37) 


n=0 


Since ¢@oo(u) defined in (25) is the joint Laplace transform under the classical 
compound Poisson risk model without a barrier we can use Dickson [3] to complete 
the proof. 


Moreover, the following result holds true. 


Lemma 2 The function v(u) given in (22) equals 


0 ioe) 
v(u) = px nf ea (u,n, t)dt, (38) 
n=0 0 
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where 


w(u,0,t) := g,(—u,0, ft), 


n 


aA\™ peit pu 
w(u,n,t) = = (=) [ [ 8c} (y,n—m, t)bm(u —x, y+x)dxdy 4 8cj(—u,n,t), n=l 


m=1 


n—1 


n (-1/ s n—1 ¢(n—j)* jk 
bau, y) = > : —x J +u-— J dx. 
(u, y) & (‘) Ti hy (u — x) f (y tu—x)f/*(x)dx 


Proof Our goal is to express v(u) as the Laplace transform: 


[oe 
v(u) = / e Pl E(u, t)dt. (39) 
0 
We start from definition (22): 


(oe) X n 
vw = >> (=) (Tp, f)"* * p(u) 


n=0 
far 7 
-» (=) [ann u=serax +e (40) 
cl 0 
n=1 


Using Dickson and Willmot [5] we can obtain the following representation: 


(Tp; f)"" (Ww) = [ e °bn(u, y)dy (41) 


n-1 =| j u 
a= (') aa i (u— xy! fOP*(y tw — x) fda. 
j=0 


Ar\" f% ¢® : 
— i i e "'*by(u — x, y)dye?'*dx + ef!" 
Cl 0 0 


Ar\" fe e 
*) / e pit / b,(u—x,t +x)dxdt 
Cl 0 0 


lee) ar\" 0 u oo 
+> (=) i a by(u = x,1¢adxdr+ f eP'§_(t)dt. 
cl —u = 0 
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Comparing the coefficients of e~?!" in (39) gives: 


(oe) 


eu,)= >) (=) [ by(u — x,t + x)dx + bu (0): (42) 


n=1 


see also [11]. Using (36) and (42) in (39) we end up with: 
= i —piy ,y)dy piu 
vin) = fo meu, vidy +e 


co © co F 
=f oe [Menon nate ydy 
0 y/e1 


n=0 


asd foe) cyt 
7 oe a Bei (Vn, E(u, y)dydt 


n=0 


ad foe) n a cyt u 
= a eo vie [ i 8c, (y,n — mM, t)bm(u — x, y + x)dxdy + ge, (—u, n, t) | dt 
0 1 0 0 


n=1 m=1 


CO 
+f eg, (—u, 0, t)dt 
0 


which completes the proof. 
Using above lemmas we will prove the main result of this paper. 


Theorem 4 For 0 < u < bandm > | the joint density of the number of claims 
until ruin N; and the time to ruin t is given by 


A —M BP c2 
MET F(cot +b+ a 


m m—1 b 


— hb b or b 
wi(u,m,t)=e % YP amnr-2)-y f o(b,m—n,t — — — z)wy(u,n, z)dz}, 
n=1 Cl n=1 0 “1 


(43) 
where forn > 1 


c(b, 0, t) := w(b, 0, t) = gc, (—b, 0, 1), 


n—1 


co(b,n,t) := w(b,n,t) — > 


m=0 


5 b et cqzZ+x 
mal i a(b—x,n—1-mt—2 f Sf (y)g2(y — x, m, z)dydzdx, 
c2 «JO 0 x 


HY cottb _ 
yb, 1,1) i= mal F(y)go(y — b, 0, dy, 
2 Sb 


n—2 Xr b et C2Z4+xX 
y(b,n, t) = >= / i Wood = x50 =m = 1412) | SO) g2(y — x, m, z)dydzdx, 
2/0 JO x 


m=0 


t 


v(u, m,n, t) =) o(b,m —n,t — Z)Woo(u,n, Z) + (V(b, n, t — Z) — Woo (bn, t — z)) o(u,m — n, z)dz. 
0 
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Proof In order to get the joint density w(u,n, t), we have to take inverse Laplace 
transform with respect to 6 rather than ; and 2. To do this we must find firstly 
the relationship between transforms with respect to ~;, o2 and 6 by applying the 
Lagrange’s Expansion theorem. For convenience, we will denote: 


Ar 
x(b) = vb) — me Tp, f (b). (44) 
Then we can rewrite (20) as follows: 


Xr - 
x(b) 1) = xb) boo(u) + = [oo * Tp, f (b) + Tp, F (b)] v(u) — Goo(b) vu). 


(45) 
Putting (37) and (38) into (44) we will derive: 
= n i —ét Ar 7 
x(b) = Yor fi e °a@(b,n, t)dt — =| v(b — x)T py f (x)dx 
n=O 0 c2 JO 
oo lo) ar oO n b poo foe) 
= n —dt b,n,t)dt — n ie! —dt b—x,n— sat f —dz 

es ow (b,n, t)a me hae 2s ae w( x,n—m,t) oe 


n=0 n=0 m=0 


c2Z+Xx 
| f(y) g2(y — x, m, z)dydzdx 
x 


n—-1 


kasd b pt 
=) eM artb.n.tndt = Yoo [ey > f | w(b—x,n—1—m,t—z) 
0 0 c2 JO 0 


n=1 m=0 


coz+x 
| f(y)g2(y — x, m, z)dydzdx}dt 
x 


00 os n—1 h b et c2Z+X 
ay) ony - Y= | [ m(b—x.n—1-mt—2 f f(y) 
n=1 0 m=0 €2 JO v0 < 


oO 


go(y — x,m, z)dydzdx}dt +f ew (b, 0, t)dt 
0 


= pas oe e'c(b, n, tdt. (46) 
0 


n=0 
Similarly, using Lemma 1, we can check that: 


oo ioe) 


[boo * Ty f (b) + Tp F(b)] = Sor" / e 'y(b,n, t)dt. (47) 


n=1 0 
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Using (38), (46) and (47) in (45) we obtain: 
oo ee) 
Ss rm / et 
m=1 9 


[oe) fore) m t 
= Daa) aa (y(b, n, t — Z) — Woo(b, n, t — Z)) O(u, m — n, z)dzdt 
0) 0 


m=1 n=1 


>| ¢(b,m—n,t —z) (wi (u, n, Z) — Woo(u, n, z)) dzdt 
0 


n=1 


or equivalently that 


m t 
yi s(b,m—n,t — z)wi(u,n, z)dz 
0 


n=1 


m t 
= Be | ¢(b,m =n, t = Z)Woo(u,n, 2) 
0 


n=1 


or (y (b, n,t — Zz) = Woo(b, n, i= z)) w(u,m —*fn, z)dz. 


Now, if m = | then 


t 
i ¢(b,0,t —z)wi(u, 1, z)dz = Ou, 1, 1,2). 
0 
In this case 
' Ab Ab b 
; d—b/e (t — z)et wi(u, 1, z)dz = eI wi(u, 1,t+ = 
0 1 
a _ 
= op Rett baw 
c2 Ci 
and 


XK _ By Cc 
wile, ps —e OS Per tb 2G =). 
C2 Cl 


Similarly, if m = 2 then 


t ab 2 t 
/ b_p/e, (t — ze wi (u, 2, z)dz = Yi ow, 2,n,t) =f ¢(b,1,t — z)wi(u, 1, z)dz 
0 0 


n=1 


and 


0 


_» |< db. pta b 
wr, 2.0) =e | YG mt — 2)— fT eb, 10— 2 — shui 1 zdde | 
Cl Cl 


n=1 


Similarly we can prove the assertion for any m > 1. 
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Theorem 5 For u > bandm > | the joint density of the number of claims until 
ruin N;, and the time to ruin t is given by 


m—lm—k-1 pay—b pt peoz 
wr(u,m,)=(2" > [ i I 82(9, k, Z)Bm—k—-n—1(u — b — x, y)e(a, n, t — z)dydzdx, 
C2 =0 n=0 °° Oreo 
(48) 
where 
cottutb _ 
bu, 0 i; F(y)g2(y — u —b, 0, dy, 
ut+b 
m u+b t C2Z+x 

é(u, m, t) =p ae i; wut b—xn1—2) f f(y)g2(y — x,m — n, z)dydzdx 
n=1°" 0 - 


cottutb _ 
+f F(y)go(y —u—b,m, z)dy, n=l. 
u+b 


Proof To obtain an expression for w2(u, m, t) we first consider h(x) defined in (16). 
Using (37) we can derive: 


utb © ioe) 
hu) =| ee] e 'wi(u +b — x, m, t)dt 


i“ m=1 0 


oo lee) coZ+x 
x aa aa fO)g2(y — x, n, z)dydzdx 
n=0 0 * 
oo ee) cgttutb _ 
+ yee fet fo Fone —u = b.nndyar 
n=0 0 u+b 


oo les) n utb et 
3g) aly f / wi(u+b—x,m,t —z) 
= 0 Uu 0 


m=1 


cqz+x 
x / f(y) g2(y — x, n — m, z)dydzdx 
e 4 


cottutb _ 
“h / F(y)go(y —u—b,n, cydy Jar (49) 
u+b 


[e-s) cott+ut+b 2 
= / em / F(y)g2(y — u — b, 0, t)dydt 
0 ut+b 
lee) 


(oe) 


=yor" i eeu, n, pdt. (50) 
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Moreover, substituting (41), (49) and (36) into (15) gives: 


ee) ar m+l1 = pu—b 
g2(u) = x; (=) [ (Tp, fy"* (u — b — x)h(x)dx 


c2 


m=0 
co ar m+l pu—b poo ac oo 
= a (=) [ [ e 2’ by(u — b— x, y)dy >» r” [ e* e(x, n, tdtdx 
c2 0 0 0 
m=0 n=0 
yr \mtl pu-b poo & rf F 
= > (=) d. [ S / e “go(y,k, z)dzbm(u — b — x, y)dy 
may 6° 0 40 R06 9/62 
00 co 
x a al ec e(x, n, t)dtdx 
n=0 0 
fore) 00 ,\m m—lm—k-1 ay—b pt oz 
= ey @|(2) yy [ [ [ 82(y,k, Z)bm—k-n—1(u — b — x, y) 
mal °9 at fey geo 78 #0 
xe(x,n,t— c)dydeds hat. (51) 


Comparing Eqs. (51) and (4) completes the proof. 
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Numerical Approximations M®) 
to Distributions of Weighted od 
Kolmogorov-Smirnoyvy Statistics 

via Integral Equations 


Dan Wu, Lin Yee Hin, Nino Kordzakhia, and Alexander Novikov 


Abstract We show that the distribution of two-sided weighted Kolmogorov- 
Smirnov (wK-S) statistics can be obtained via the solution of the system of two 
Volterra type integral equations for corresponding boundary crossing probabilities 
for a diffusion process. Based on this result we propose a numerical approximation 
method for evaluating the distribution of wK-S statistics. We provide the numerical 
solutions to the system of the integral equations which were also verified via Monte 
Carlo simulations. 


1 Introduction 


The applications of one-sided and two-sided weighted Kolmogorov-Smirnov 
(wK-S) statistical tests are ubiquitous in diverse areas of applications, including 
physics, finance, computational biology and Gene Set Enrichment Analysis 
(GSEA), see e.g. [8, 14, 21]. In some cases there exist modifications of the wK-S 
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whose limit distributions (for large sample sizes) can be represented as the following 
random variable 


B,; — g(t 
De, p= sup Bea soe 


: 1.1 
teT ft) oo 


where g(t) and f(t) are some deterministic functions of tf, B = {B,,t € [0, 1]} 
is a standard Brownian bridge, the random variable € is independent of B and has 
the standard normal distribution, § ~ N(0,1), T © [0, 1]. Note that analytical 
expressions for the distribution function P{Dg,¢ < x}, x > 0, are not available in 
closed form besides the classical case when g(t) = 0, f(t) = 1 (see Kolmogorov 
[15]). 

Recent applications of wK-S in GSEA (see e.g. [6]) require the development of 
fast and accurate numerical approximations for the cumulative distribution functions 
(cdf) of Dg, ¢ for specific functions f and g. 

This paper addresses the issues of approximating cdf of the Dz,¢ under the 
following two important settings: 


l. FO =1,¢ = {e(t) =t% -2,1/2 <@ <1,t €T = [0, l]}, 
2. f= —H,t eT =[a,b], g(t) =0, O<a<b <1. 


Our goal is to find accurate numerical approximations for the following corre- 
sponding tail distributions 


= a 3 | B;| 
Pi(x) := Paes g(t)é| > x}; Po(x):=P {, Aaa =F > 7 ‘ 


Setting | was recently discussed in the context of GSEA, see e.g. [6, 16, 17]. 
The family of functions g is of special relevance there. In particular, the case a = 
2/3 in Setting 1 corresponds to GSEA analysis where the weights of the genes in 
question are replaced by their respective ranks obtained based on their expressions 
in typical experiments. Examples of such gene expression profiles are accessible 
from the Gene Expression Omnibus repository [12]. Note that g = 0 corresponds to 
the classical Kolmogorov-Smirnov test statistic where the closed-form expression 
for P{Do,1 < x} is well known [15]. See also [10], [11] and [20] for a historical 
account. 

Setting 2 corresponds to the wK-S test suggested by Anderson and Darling [1]. 
It is designed with the purpose to increase the sensitivity for the tails of empirical 
distributions compared to the classical K-S test. Some asymptotic result for the tail 
probabilities under this setting have been derived recently in [7]. 

In general, finding the distribution of Dg f is a computationally intensive 
numerical problem, a subject of pursuit for many different approaches, each with its 
own merits and shortcomings, that are devised specifically to address this problem 
from different perspectives. In [9] and [2], among others, the authors reduce the 
problem of approximating the extrema of modified Brownian bridges to finding 
boundary crossing probabilities (BCP) with respect to Brownian motion. In line 
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with this approach, piecewise linear boundaries were used to replace nonlinear 
boundaries and approximate the desired distribution by an n-dimensional integrals 
in a similar way as used in [3, 18], and [22]. The convenient feature of this approach, 
as demonstrated in [17] and [16] for the case of the one-sided version of wK-S, is 
a possibility to obtain analytical upper and lower bounds for the tail distributions 
of statistics which lead to fast and reasonably accurate approximations. However, 
for the case of the two-sided wK-S this approach requires substantial computational 
cost in situations when highly accurate approximations are needed. 

Historically it was Kolmogorov [15] who first found the distribution of Do,1 
as a solution of a partial differential equation (PDE). Subsequently, Anderson 
and Darling in [1], had applied this approach for solving related problems in 
the construction of goodness-of-fit tests. Under Settings 1 and 2 it is possible to 
use the finite-difference schemes or finite element method to obtain numerical 
approximations. However, the PDE approach seems to be not computationally 
efficient due to the fact that not only function evaluations are required at larger 
number of discretised points, but substantially higher computational burden is 
incurred due to the element assembly process [4]. 

The technique discussed in our paper is inspired by the work of Peskir (see 
Theorem 2.2 in [19]) who derived an integral equation of Volterra type for BCP 
with one-sided boundaries. In this paper we are expanding this technique for BCP 
with two-sided boundaries deriving a system of two integral equations of Volterra 
type. Note that this system of integral equations was derived in a different way by 
Buonocore et al. [5] using a different approach. As a matter of fact our technique 
is applicable to all regular diffusion processes where transition probabilities are 
available in a closed form. The advantage of this approach is that a system of integral 
equations are rather straight forward to obtain for all one-dimensional diffusion 
processes and efficient numerical techniques can be easily developed to solve the 
equations. The complete results including the case of general diffusion processes 
will be presented in another publication. 

The paper is organised as follows. In Sect.2 we formulate the results which 
provide a system of two Volterra integral equations for P{Dg,f < x}. In Sect.3 
we describe the numerical algorithm and results as well as comparisons with Monte 
Carlo simulation. 


2 Construction of a System of Integral Equations 
of Volterra Type 


In this section, we formulate a general result on BCP for two-sided boundaries 
which will be used for deriving a system of integral equations to evaluate the 
distribution of Dg -. 

Note that in Setting 1 we need to find BCP of the following form 


Pi(x, y) = Pane — g@)y| < x} = P{L@ < B, < U(t),t € [0, 1} 
te 
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where 

Lit)=-x+g@)y, U®)=x+g)y,x>0,yeER. 
Having the function Pi(x,y) the distribution of two-sided wK-S statistics 
can be evaluated via integration with respect the density function ¢(y,o) := 


1_4-(/0)?/2 . 
oV2n 


Pi(x) = 1— P{Dg,1 <x} = / o(y, DU — Pit, y))dy. (2.1) 


In Setting 2 a similar integral representation holds due to the following relations: 


_ | B;| = = 
Po(x) =P (ss, Aca > : = P(|Bg| > x.a( —a))+ 


P (vt <x/fall—a}n | ap as. ‘t] 


tela,b] Vt — £) 
xia 
= 1 —erf(x/V2) + go Vt(l—1))( — Pox, y))dy (2.2) 
—xJiq 


with 
Pr(x, y) = P{L(t) < By < U(t),t € [a, b]|Ba = yy}, y ER, 


LQ) =-xVid—), Uit)=xVtd—D, x>0. 


To derive equations for BCP under the general setting for a general diffusion 
process X = {X;, t > 0} (defined on a suitable probability space) we set 


1 int {0 > X;, < L(t); Bs < U(s), Vs € (to, 1)|Xty = Xo}, (2.3) 
alo 

ty = int tr > X; = U(t); Xs > L(s), Vs € (fo, t)|Xy = xo}, (2.4) 
alo 

t= inf{t : Xp ¢ (L(), U(t))|Xi = x0} = inf{t~, c*}. (2.5) 
210 


Let fi (tlxo, to), fu(t|xo, to) and f(t|xo, to) be the densities of t,, ty and Tt 
respectively (subject their existence) and Xj, = xo. 
Now we state a modified version of Theorem 2.2 in [19] for the two-sided BCP. 
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Theorem 1 Let X be a one-dimensional diffusion process with boundaries U and 
L being continuously differentiable functions satisfying inequalities L(s) < x9 < 
U(s) and L(t) < U(t) forallt > s. Let Fy and F,be the cumulative distribution 
functions of ty and ty, respectively. The following system of integral equations 


t 


t 
POG .tlx0.s) = f PCG. 11066). 8)Fu(ds}x0.5)+ P(G\, t|L(s), 8) Fi (ds|xo, 8) , 
: i (2.6) 
t t 
P(Ga.tix0,3)= f P(G2,11U(6).8)Fu(ds x0. 8)+ | P(G2, t|L(s), s) Fr (ds|xo, s) , 
Ss Ss (2.7) 


hold for any measurable sets G,; © [U(t), 0) and G2 C (—~, L(f)]. 


The proof of this result will be presented in a full version of this paper; we 
just mention that it is based on the use of the Chapman-Kolmogorov equation as 
a starting point. 

In both Settings 1 and 2 we need to derive equations for the case when X = B 
is a standard Brownian bridge. Since B is a Gauss-Markov process we have 


R(s,t) R°(s, ay} 02.8) 


P(y,t|x,5s)~N , Rit, t) - 

(y, thx, 5) (Te: Cis 
where R is the covariance function of B. Using this representation, upon the 
substitution of the initial condition Bj, = xo into the Eqs. (2.6) and (2.7). Letting 


1-t 1-t 
Y— Tost Y— Tos% 
——= |, P(y|x,5) = = 


W(y|x,5) =v | — ————. 
/ (t—s)U—t) / (t—s)U—t) 
(1-s) (1-s) 
we have 


t t 
WUC.) = f W(U(t)|U(s), 5) fu (s|x0, to) ds +f W(U(t)|L(s), 8) fi (slx0, to) ds, 


0) 1 


(2.9) 
t t 
P(L(t)|x0, to) = / @(L(t)|U(s), 5) fu (s|xo, to) ds +f P(L(t)|L(s), 8) fi (8|x0, to) ds 
t t 

° : (2.10) 

respectively, where (x) = a o(z, 1) dz and W(x) = 1— P(x). 

Since by assumptions U(t) and L(t) are differentiable, we have 

1 

lim W(U(t), t|U(s), 5) = lim @(L(t), t|L(s), 5) = YO) = 7 (2.11) 


lim W(U(t), t|L(s), 5) = lim @(L(t), t|U(s), 5) = W(—oo) = 0. (2.12) 
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Hence, the kernels W(-) and ®(-) are non-singular and are differentiable with respect 
to ¢ for the case X = B. 

The system of integral equations (2.6) and (2.7) (and the corresponding Eqs. (2.9) 
and (2.10) for the case of Brownian bridge) are Volterra equations of the first kind; 
they can be reduced to Volterra integral equations of the second kind which are 
numerically more suitable. 


Theorem 2 Let fy (t|xo, to) and fr (t|x0, to) be the probability density functions of 


ty and ty respectively and let p(y, t|x,s) = Ply, t|x, 8), then 


t 
fu (t|x0, to) = 2p(1, t|X0, to) — 2 | p(y1, t|U(s), 5) fu (S|x0, to) ds 
i) 


t 
-2f P(y1, tIL(s), 5) fr (s|x0, to) ds, (2.13) 
10 


t 


fi (t|xo, to) = 2p(y2, t|x0, to) — 2 | P(y2, t|U(s), 5) fu (sx, to) ds 


to 


t 
-2f P(y2, t|L(s), 8) fx (s|x0, to) ds (2.14) 
t 


0 
hold for any y, © [U(t), 00) and y2 C (—~, L(t)]. 


Equations (2.13) and (2.14) are obtained by differentiating (2.6) and (2.7) with 
respect to ¢ and then using the relation 


lim PU), t|U(), ) = lim PLO), tL), 8) = _ 


Note that a similar approach for the one-sided case was used by Fortet [13] . 
It follows that for a standard Brownian bridge with the initial condition B,, = xo, 
we have 


fu (t|xo, to) = 2 fu (s|xo, to) ds 


dW (U(t)|x0, to) -2f aw (U(t)|U(s), 5) 
ot to ot 


t 
-2 i fi (oln0 0) ds, ve 
10 


ot 
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and 


fu (s|xo, to) ds 


t 
Filtlxo, 9) = 2S CORO) _ 9 f aP(L()|U(), 8) 
1 


ot ot 


_ a IP(L(t)|L(s), 5) 
1 


OF fi(s|Xx0, to) ds. (2.16) 


Due to properties (2.11) and (2.12) , singularities in the denominator of the kernels 
can be removed. 
3 Numerical Integration Procedure for Approximation BCP 


For Settings | and 2 we use (2.1) and (2.2) 


le) 1 
oe / [ $(y, 1)(fu (t10, 0) + fr (410, 0))dy 


and 


P(x) = eietr/v5)+ ft o(y, Vt —t))(fuly, a) + fi(tly, a))dy 
kata 


with boundaries L and U shown in Sect. 1 respectively. To calculate fy and fr we 
use (2.15) and (2.16). 

Let t = t +ih,i = 1,...,m, where h is the time step size of uniform 
discretisations. We use the Euler approximation to obtain fz, (tj) = fL(ti|x, 5) 
and fu (ti) = fu (ti|x, 5) at an increasing sequence of knots f; and the appropriate 
Gaussisan quadrature for numerical integrations. 

For each of the aforementioned cases, we preform N simulations and use 
m equally-spaced discretization in the time interval 7, and we estimate the tail 
probabilities 


Prams) = P[D fem 2x] * Ff 29 


where /(-) is an indicator function. 
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Table 1 Setting 1: estimated tail probabilities Py (x) using integral equations, compared to 
simulations p f,g:m (Xx) 


x Pi(x) Pygim(e) | Varlpfgim(x)] x 107 | 1Pi®) — pgm) 
0.4 | 0.997467 | 0.997395 0.025982 0.000072 
1.2 [0.128036 — |0.127972 | 1.115952 0.000064 
2.0 | 0.001056 —_| 0.001040 0.010389 0.000016 
2.2 |0.000219 | 0.000233 0.002329 0.000014 


For Setting 1 we calculate P (x), the tail probabilities approximated using the 
aforementioned approximation for p f,g:m(x) with m = 2!9 discretised time steps 
and the usage of n = 20 Gauss-Hermite nodes in the numerical integrations. Table 1 
contains the comparisons of P (x) and Pp f.g:m- 

For Setting 2 we calculate Py (x), the tail probabilities approximated using 
the aforementioned approximation for pfo:m(x) in this setting, with m = a 
discretised time steps and the usage of n = 20 Gauss-Legendre nodes in the 
numerical integrations. In Table 2 we compare P(x) to simulation results, and 
asymptotic estimators P(x) generated from [7]. 

The numerical approximations using integral equations are performed on a 
Macintosh laptop computer running OS X with 8GB RAM and 1600 MHz CPU, 
and computer programs are implemented in C++98. Each point of pg(x) and p;(x) 
in the tables takes 1.92 and 1.87s respectively. The Monte Carlo simulations for all 
cases are carried out on a cluster computer with 28 parallel CPUs using N = 10’ 
simulation runs and m = 2!° equally-spaced discretization time intervals. 

In both settings, tail probabilities estimated using integral equations and sim- 
ulation approach differ only at the third decimal place and beyond, suggesting 
that the numerical integration approach delivers a level of accuracy comparable to 
those of the simulation approach despite the use of a comparatively coarser grain 
discretization time step, i.e, m = 2!° in the former as opposed to m = 2!° in the 
latter. 

The major advantage of our method is the relative simplicity and fast calculations 
compared to other techniques. For approximations using integral equations, since 
discretization time steps of the order m = 2! is sufficient to deliver tail probability 
estimates comparable to those of simulation at N = 107 and m = 2!°, they can be 
evaluated in a modest computational framework to reduce costs. 
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Introduction to Extreme Value Theory: M®) 
Applications to Risk Analysis oroet 
and Management 


Marie Kratz 


Abstract We present an overview of Univariate Extreme Value Theory (EVT) 
providing standard and new tools to model the tails of distributions. One of the 
main issues in the statistical literature of extremes concerns the tail index estimation, 
which governs the probability of extreme occurrences. This estimation relies heavily 
on the determination of a threshold above which a Generalized Pareto Distribution 
(GPD) can be fitted. Approaches to this estimation may be classified into two 
classes, one qualified as ‘supervised’, using standard Peak Over Threshold (POT) 
methods, in which the threshold to estimate the tail is chosen graphically according 
to the problem, the other class collects unsupervised methods, where the threshold 
is algorithmically determined. 

We introduce here a new and practically relevant method belonging to this second 
class. It is a self-calibrating method for modeling heavy tailed data, which we 
developed with N. Debbabi and M. Mboup. Effectiveness of the method is addressed 
on simulated data, followed by applications in neuro-science and finance. Results 
are compared with those obtained by more standard EVT approaches. 

Then we turn to the notion of dependence and the various ways to measure it, 
in particular in the tails. Through examples, we show that dependence is also a 
crucial topic in risk analysis and management. Underestimating the dependence 
among extreme risks can lead to serious consequences, as for instance those we 
experienced during the last financial crisis. We introduce the notion of copula, which 
splits the dependence structure from the marginal distribution, and show how to use 
it in practice. Taking into account the dependence between random variables (risks) 
allows us to extend univariate EVT to multivariate EVT. We only give the first 
steps of the latter, to motivate the reader to follow or to participate in the increasing 
research development on this topic. 
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1 Introduction 


Quantitative risk analysis used to rely, until recently, on classical probabilistic 
modeling where fluctuations around the average were taken into account. The 
standard deviation was the usual way to measure risk, like, for instance, in 
Markowitz portfolio theory [26], or in the Sharpe Ratio [34]. The evaluation of 
“normal” risks is more comfortable because it can be well modelled and predicted 
by the Gaussian model and so is easily insurable. The series of catastrophes that 
hit the World at the beginning of this century (see Fig. 1), natural (earthquakes, 
volcano eruption, tsunami, ...) or financial (subprime crisis, sovereign crisis) or 
political (Arab Spring, ISIS, Ukraine, ...), made it clear that it is crucial nowadays 
to take also extreme occurrences into account; indeed, although it concerns events 
that rarely occur (i.e. with a very small probability), their magnitude is such that 
their consequences are dramatic when they hit unprepared societies. 

Including extreme risks in probabilistic models is recognized nowadays as a 
necessary condition for good risk management in any institution, and not restricted 
anymore to reinsurance companies, who are the providers of covers for natural 
catastrophes. For instance in finance, minimizing the impact of extreme risks, 
or even ignoring them because of a small probability of occurrence, has been 
considered by many professionals and supervisory authorities, as a factor of 
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Fig. 1 Some of the extreme events (covered by reinsurances) that hit the World between 2001 and 
2015 
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aggravation of the financial crisis of 2008-2009. The American Senate and the 
Basel Committee on Banking Supervision confirm this statement in their reports. 
Therefore, including and evaluating correctly extreme risks has become very topical 
and crucial for building up the resilience of our societies. 

The literature on extremes is very broad; we present here an overview of some 
standard and new methods in univariate Extreme Value Theory (EVT) and refer the 
reader to books on the topic [1, 8, 12, 24, 31-33] and also on EVT with applications 
in finance or integrated in quantitative risk management [25, 27, 29]. Then we 
develop the concept of dependence to extend univariate EVT to multivariate EVT. 
All along applications in various fields including finance, insurance and quantitative 
risk management, illustrate the various concepts or tools. 


1.1) What Is Risk? 


Risk is a word widely used by many people and not only by professional risk 
managers. It is therefore useful to spend a bit of time analysing this concept. We 
start by looking at its definition in common dictionaries. There, we find that it is 
mainly identified to the notion of danger of loss: 


The Oxford English Dictionary: Hazard, a chance of bad consequences, loss or 
exposure to mischance. 


For financial risks: “Any event or action that may adversely affect an organiza- 
tion’s ability to achieve its objectives and execute its strategies” or, alternatively, 
“the quantifiable likelihood of loss or less-than-expected returns”. 


Webster’s College Dictionary (insurance): “The chance of loss” or “The degree 
of probability of loss” or “The amount of possible loss to the insuring company” 
or “A person or thing with reference to the risk involved in providing insurance” 
or “The type of loss that a policy covers, as fire, storm, etc.” 


However, strictly speaking, risk is not simply associated to a danger. In its 
modern acceptance, it can also be seen as an opportunity for a profit. It is the main 
reason why people would accept to be exposed to risk. In fact, this view started 
to develop already in the eighteenth century. For instance, the French philosopher, 
Etienne de Condillac (1714-1780), defined risk as “The chance of incurring a bad 
outcome, coupled, with the hope, if we escape it, to achieve a good one”. 

Another concept born from the management of risk is the insurance industry. 
In the seventeenth century, the first insurance for buildings is created after the big 
London fire (1666). During the eighteenth century appears the notion that social 
institutions should protect people against risk. This contributed to the development 
of life insurance that was not really acceptable by religion at the time. In this 
context, the definition of ‘Insurance’ as the transfer of risk from an individual to a 
group (company) takes its full meaning. The nineteenth century sees the continuous 
development of private insurance companies. 
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Independently of any context, risk relates strongly to the notion of randomness 
and the uncertainty of future outcomes. The distinction between “uncertainty”’ and 
“risk” was first introduced by the American economist Frank H. Knight (1885- 
1972), although they are related concepts. According to Knight, risk can be defined 
as randomness with knowable probabilities, contrary to uncertainty, which is ran- 
domness with unknowable probabilities. We could then define risk as a measurable 
uncertainty, the ‘known-unknowns’ according to Donald Rumsfeld’s terminology, 
whereas uncertainty is unmeasurable, the ‘unknown-unknowns’ (D. Rumsfeld). 
Of course, research and improved knowledge help to transform some uncertainty 
into risk with knowable probabilities. 

In what follows, we focus on the notion of risk, in particular extreme risk, and 
its quantification. We choose a possible definition of risk, used within probabilistic 
framework, namely the variation from the expected outcome over time. 

There are numerous ways to measure risk and many risk measures have been 
developed in the literature. Most modern measures of the risk in a portfolio are 
statistical quantities describing the conditional or unconditional loss distribution of 
the portfolio over some predetermined horizon. Here we present only popular ones, 
used in particular in regulation. We recall their definition and refer to Emmer et al. 
[13] and references therein for their mathematical properties. 


Mean 
# VaR Standard Deviation 
2 VO measures typical 
S | size of fluctuations 
2 1.60% 
a Value-at-Risk (VaR) 
S 1.20% Ha 
= measures position of 
2 0.80% 99th percentile, 
> . 
= 0.40% ,Nappens ones: ina 
8 hundred years 
& 0.00% 
450 498 547 595 643 691 739 | xPected Shortfall (ES) 
is the weighted 
Gross Losses Incurred (EUR million) average VaR beyond 


the 1% threshold. 


Standard Deviation o: Portfolio theory (Markowitz) 
Value-at-Risk (VaR) = quantile gy(L) = inffg ¢ R: P(L > gq) < 1-a} 
Expected Shortfall (ES) (or TVaR) 


1 1 
ESg(L) = —| qp(L) dB 


= PIbI ES gkh)) 
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We will focus on the analysis of extreme risks, related to unexpected, abnormal 
or extreme outcomes. 

Many questions arise as: How to model extreme risks? How to study the behavior 
in the tails of the distribution of the model? How to capture dependency and measure 
it in risk models? which methods can be used? What about aggregation of risks? 

First we present the main concepts of univariate EVT. Then we introduce the 
issue of dependence among random variables (risks). 


1.2. Impact of Extreme Risks 


When considering financial assets, because of the existence of a finite variance, a 
normal approximation is often chosen in practice for the unknown distribution of the 
yearly log returns, justified by the use of the Central Limit Theorem (CLT), when 
assuming independent and identically distributed (iid) observations. Such a choice 
of modeling, in particular using light tail distributions, has shown itself grossly 
inadequate during the last financial crisis when dealing with risk measures because 
it leads to underestimating the risk. 

On Fig. 2, the QQ-plot of the S&P 500 daily returns from 1987 to 2007, helps 
to detect a heavy tail. When aggregating the daily returns into monthly returns, 
the QQ-plot looks more as a normal one, and the very few observations appearing 
above the threshold of VaRo9%, among which the financial crises of 1998 and 1987, 
could almost be considered as outliers, as it is well known that financial returns are 
almost symmetrically distributed. Now, look at Fig. 3. When adding data from 2008 
to 2013, the QQ plot looks pretty the same, i.e. normal, except that another “outlier” 
appears ... with the date of October 2008! Instead of looking again on daily data 
for the same years, let us consider a larger sample of monthly data from 1791 to 
2013 (as compiled by Global Finance Data). With a larger sample size, the heavy 
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Fig. 2 QQ-plots of the S&P500 daily (left) and monthly (right) log-returns from 1987 to 2007 
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Fig. 3. QQ-plots of the S&P500 monthly log-returns from 1987 (left) or 1791 (right) to 2013 


tail becomes again visible. And now we see that the financial crisis of 2008 does 
belong to the heavy tail of the distribution and cannot be considered anymore as an 
outlier. Although it is known, by Feller theorem, that the tail index of the underlying 
distribution remains constant under aggregation, we clearly see the importance of 
the sample size to make the tail visible. The figures on the S&P 500 returns illustrate 
very clearly this issue. 


2 Univariate EVT 


Let (X;)j=1,..... be iid random variables (rv) with parent rv X and continuous 
cumulative distribution function (cdf) F (that is unknown). The associated order 


Statistics are denoted by min (X;) = Xin < Xan < ++: < Xn-tn < Xan = 
l<i<n 

max (X;). 

l<i<n 


2.1 CLT Versus EVT 


* Mean behavior. Assuming the existence of the variance o* of X, the Central 


- I 
Limit Theorem (CLT) tells us that the empirical mean X, = — y X;, when 
n 
i=1 
normalized (since var(Xn) = + var(X) — 0), has an asymptotic standard 
n—-> oo 


Gaussian distribution (whatever is F’): 


Xn = (Xn) = Y ; Jn(Xn — ph) d 
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. : var (X) 
<x) =F yoy) with b, = E(X), a, = — 


ie. lim P[(Xn — bn)/an 
n—->Co 


¢ Extreme behavior. Instead of looking at the mean behavior, consider now the 
extreme behavior, with for instance the maximum. Noticing that 


Oif F(x) <1 


n 
_ _ pn 
P[ max X; Sa =| As Sx] =P) eee i. 


l<i<n 


could we find, as for the CLT, a linear transformation to avoid such degeneracy, 
and say that there exist sequences (a,), (b,) and a rv Z with cdf H such that 
lim P[(max X; — bpn)/an < x] = H(x)? It comes back to look for (a,) and 
n—>oo 


(b,), and a non-degenerated cdf H s.t. 


X;—b 
P —— 2 | = P[ max X; <a_x + Bp] = F'Gnx + 0a) ~ Hx). 
A> OO. 


an l<i<n 


It can be proved that there is not a unique limit distribution as for the CLT, but 
three possible asymptotic distributions (whatever is F’), namely: 


Theorem 1 (The ‘Three-Types Theorem’; Fréchet-Fisher-Tippett Theorem, 
1927-1928; [15]) The rescaled sample extreme (max renormalized) has a limiting 
distribution H that can only be of three types: 


Ay q(x) = exp{—x F150) (a > 0) : Fréchet 
An g(x) = L(x>0) + exp{—(—x)"}1¢.<0) (a > 0) : Weibull 
Fz 9(x) := exp{—e“}, Vx € R : Gumbel 


(A similar result holds for the minimum). 

We can then classify the distributions according to the three possible limiting 
distributions of the (rescaled) maximum, introducing the notion of Maximum 
Domain of Attraction (MDA): 


FeEMDA(H) Cn) > 0, (bn): Vx ER, lim F" (nx + bn) = Ha). 
n> 


For instance, for Fréchet, a, = F~!(1 — 1/n) and by, = 0. (Note that most of the 
cdf F we use, usually belong to a MDA.) 

To mimick the CLT, the three types of extreme value distribution have been 
combined into a single three-parameter family [18, 19, 36] known as Generalized 
Extreme Value Distribution (GEV). 
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Theorem 2 (The EV Theorem) /f F ¢€ MDA(G) then, necessarily, G is of the 
same type as the GEV cdf Hg (i.e. G(x) = Hz (ax + b), a > 0), defined by 


1 
= “eG 
exp(—e *) ifé =0 
where y_ = max(0, y). The parameter €, named the tail (or extreme-value) index, 


determines the nature of the tail distribution: if § > 0 then He is Fréchet, if § = 0 
then Hz is Gumbel, and if & < 0 then He is Weibull. 


: x—p =§ 
We can write G(x) = Gyoe(*) = exp|—(1+é : 
o 


forl+e——" 50. 


o 
Moments of the GEV: the kth moment exits if € < 1/k (in particular if 
E(X) < oo if & < land var(X) < oo if & < 1/2). 


Example: 

Sample cdf MDA 

Uniform Weibull 
Exponential(1) (F(x) = 1—e*, x > 0) Gumbel 
Gaussian Gumbel 
Log-normal Gumbel 
Gamma (A, 1) Gumbel 
Cauchy (F(x) = 5 + — arctan x) Fréchet 

1 

Student Fréchet 
Pareto (8) (F(x) = 1— x Fox >I, B > 0) Fréchet 


Example of tails of distributions: Fréchet and Gumbel versus Gaussian (normal). 


——Fréchet (alpha=3) 


——Normal 


3.000 3.500 4.000 4.500 5.000 5.500 35 4.0 4s 5.0 55 6.0 65 7.0 75 


We observe that the tail can vary substantially according to the type of distributions. Here the tail 
of the Fréchet distribution is moderately heavy (a = & = 3) although it looks much heavier than 


the Gaussian distribution 
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Exercise Considering the S&P500 daily log returns from 1987 to 2016, compute 
the chances to find a value smaller than the second minimum, i.e. PLX < x(,—1)] = 
@ (2) (with ® the standard normal cdf), assuming the data are normally 


distributed. We obtain the following statistics on S&P500 daily log returns from 
1987 to 2016: 


Expected Value (1) 0.029% Maximum Value (xa) 11.0% 
Standard Deviation (c) 1.192% Minimum Value (x(n) -22.9% 


Probability of finding a 
value smaller than (1) 
in a Gaussian Model 


2.96E-16 Second Minimum 


4 0, 
(1 over 13.5 billion years) (x(n-1)) DARE 


Characteristic Property of the GEV. A distribution is a GEV if and only if it is 


‘ ‘ . d . 
max-stable, i.e. that it satisfies max X; = a,X + Bn, witha, > 0. 
l<i<n 


For the three types of the GEV, we have: 
Fréchet: max X; Z nile x; Weibull: max X; 2 nVEX: Gumbel: max X; z 


l<i<n l<i<n l<i<n 


X + logn. 


2.2. A Limit Theorem for Extremes: The Pickands Theorem 


Extracting more information in the tail of the distribution than just that given by the 
maximum should help for the evaluation of the tail. So considering the kth (k > 1) 
largest order statistics, we introduce the notion of ‘threshold exceedances’ where all 
data are extreme in the sense that they exceed a high threshold. 

Picking up a high threshold u < a (upper-end point of F), we study all 
exceedances above u. 


Theorem 3 (Pickands Theorem, 1975) [f F does belong to one of the maximum 
domains of attraction (i.e. the limit distribution of max X; is a GEV), then for a 
sufficiently high threshold u, 4 B(u) > 0 and — € R such that the Generalized 
Pareto Distribution (GPD) Gz,g(u), defined by Ge, piu) (Y) = 1-—Gepgw(y) = 


-Vé 

(1 + a Lée4o) + e 9/B) 4 (¢_o), is a very good approximation to the 
u 

excess cdf F,(-) := P[X —u <-|X > ul]: 


lim sup | Fu (y) — Geom) (y)| = 9, 


+ 
utxXp O<y<xf—u 


Xe denoting the upper endpoint of F 
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As for the GEV, we have three cases for the GPD, depending on the sign of the 
tail index é: 
° &€>0: Ge,p(y) ~ cy M5, c > 0 “Pareto” tail): heavy-tail 
(note that E(X*) = 00 for k > 1/é). 
° & <0: x¢ = B/|&| (upper endpoint of G), similar to the Weibull type of the 
GEV (short-tailed, Pareto type II distribution) 
° &€=0:Geg(y) = e /P: light-tail (exponential distribution with mean B) 


= 
1—-€ 


The mean of the GPD is defined for € < 1 by E(X) = 


2.3 Supervised Methods in EVT: Standard Thresholds Methods 


Univariate Extreme Value Theory (EVT) focuses on the tail distribution evaluation, 
more precisely on the estimation of the tail index. That is why the first and main 
question is how to determine the threshold above which observations are considered 
as extremes. Various methods have been developed to answer this question. We give 
here their main ideas and refer the reader e.g. to [12] for more details (see also the 
references therein). 


2.3.1 Peak Over Threshold (POT) Method 


This method developed for the GPD by Davison and Smith [7] helps to decide on an 
appropriate threshold for exceedance-based methods, when looking at the empirical 
Mean Excess Plot (MEP). This graphical method can be qualified as supervised. 
The mean excess (ME) function defined by e(u) = E[X — u|X > u] can be 
computed for any rv X (whenever its expectation exists). For instance, if X is 
exponentially distributed G¢,,, then its ME function is a constant. If X is GPD 
Gz,o distributed, with o > O and € < 1, then its ME function is given by 


o+ué 
e(u) = Joe lotus>0)- 


Hence, via the Pickands theorem, the MEP of X with unknown cdf F should stay 
reasonably close to a linear function from the threshold u at which the GPD provides 
a valid approximation to the excess distribution of X: E[X —u| X>u] = 

u—>Co 


o(u)/(. — &). It will be the way to select u, when considering the empirical MEP 


Ny 


v,— ) (x(i) — V) 1 V < Xn,n ], Where the x(;) correspond to the ny observations 
Ny 7 
i=1 


that exceed v. 

Then, u being chosen, we can use ML or Moments estimators to evaluate the tail 
index & (and the scaling parameter 8). 

Illustration: Example from Embrechts et al.’s book [12] 
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| prion ne é a : 


Data set: time series plot (a) of AT&T weekly percentage loss data for the 521 complete weeks in 
the period 1991-2000 
(b) Sample MEP. Selection of the threshold at a loss value of 2.75% (102 exceedances) 


(c) Empirical distribution of excesses and fitted GPD, with ML estimators é = 0.22 and B =2.1 
(with Standard Error 0.13 and 0.34, respectively) 


2.3.2 Tail Index Estimators for MDA(Fréchet) Distributions 


To determine the tail index, other graphical methods than MEP may be used. 
Various estimators of the tail index have been (and still are) built, starting with the 
Hill estimator [17], a moment estimator [11], the QQ-estimator [22], ..., the Hill 
estimator for truncated data [2], ... 

For a sample of size n, the tail index estimators are generally built on the k = 
k(n) upper order statistics, with k(n) — oo such that k(n)/n — 0, asin —> oo. 

Choosing k is usually the Achilles heel of all these (graphical) supervised 
procedures, including the MEP one, as already observed. 

Nevertheless it is remarkable to notice that for these methods, no extra informa- 
tion is required on the observations before the threshold (the n — kth order statistics). 

Let us present two tail index estimators under regular variation framework: the 
Hill estimator [17], as it is most probably still the most popular, and the QQ- 
estimator [22], which is based on a simple and intuitive idea (hence this choice). 

Assume F € MDA(Fréchet) with tail index € > 0, i.e. F is regularly varying 
RV_q, with € = a@!. (Recall that a function f belongs to the class RV, of 
regularly varying functions with index p € Rif f : Ry — Ry, satisfies 
limy-so0 f(tx)/f (t) = x?, for x > 0 (see [3].) Consider the threshold u = Xy_x,n 
with k = k(n) > oo andk/n > Oasn > ow. 


¢ The Hill estimator Hy, of the tail index € = a! is defined by, and satisfies [17] 


1 k-1 xX, ‘ 
n—i,n 
Ak n= z 2 l0e (=) = 5 


This estimator is asymptotically normal, with a rate of convergence of 1/a?. The 
Hill estimator can exhibit outrageous bias and graphical aids are often very difficult 
to interpret accurately. So it is wise to consider alternative methods to supplement 
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information given by the Hill estimator and associated plots. Thus we turn to the 


QQ-plot. 
¢ The QQ-estimator (Kratz and Resnick [22]), Qx.n, of the tail index &: 


The QQ-method is based on the following simple observation: if we suspect that 
the n-sample X comes from the continuous cdf F’, then the plot of 


(=: FXin)) tsi snl 


should be roughly linear, hence also the QQ-plot of ((F(4) , Xin), l<i<n} 
(considering the theoretical quantile F< ( st) and the corresponding quantile Xj,» 
of the empirical distribution function). 

If F = Fy.o(x) = Fo.1(—*), since Fi g(y) = oF (y) +», the plot of 


,o 


{(Gi: (—) Kin) 1 sisal 


should be approximately a line of slope o and intercept y. _ 
Take the example of a n-sample Pareto(a) distributed (F(x) = x~®); then, for 
y > 0, Fo.a(y) := Pilog X; > y] =e ~ and the plot of 


{ (Fé (5) toxin) tei sa} = { (- 108 (1 - Ty) loeXin) tsi sah 


should be approximately a line with intercept 0 and slope a~!. 


Now, just use the least squares estimator for the slope (SL), namely 


ia Xi — xy 


SLi, yi), 1 <i <n})= yee 


to conclude that, for the Pareto example, an estimator of a= é) is 


~~ ey —log(stz){n log Xn—itin — Diy log Xn— jin} 
n Y= (— log(s47))? — (oer — log(s))* 


which we call the QQ-estimator. 

This method can be extended from Pareto to the general case F~R V_q@; we can 
define the QQ-estimator Ox.» of the tail index € = a—!, based on the upper k order 
statistics, by 


i 
QOkn = SL({(— log — Eq” log Xn—k+in), 1 <i < k}) 
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k 
>. ~~ log (<) k log (Sees) — > log (X,25et,a) 


i=] j=! 
k k a 


KY (-to8 (it) - (So - 108 (eh) 


i=l i=l 


‘ , : P 
and we can prove [22] that the QQ-estimator is weakly consistent (Qn, —> 
nN—- Oo 


&) and asymptotically normal with a rate of convergence of 1/(2a7) (which is 
larger than for the Hill, but the Hill estimator exhibits considerable bias in certain 
circumstances). Whenever the threshold u is determined (corresponding to a kth 
order statistics), we can estimate the parameters, in particular the tail index. 
Illustration: Comparison of the Hill plot and the QQ-plot of estimates of a. 


¢ On Pareto (1) simulated data (sample size n = 1000) 


Oo 
a = 
oO 
[o> 
6 Es 
Go 2 
Oo 22 
= Oo oO 
g 2 
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Bo Eos 
© @ 
= o 
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i) 
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The QQ-plot shows al ~ 0.98. It seems a bit less volatile than the Hill plot. 
¢ On real data: 
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The Hill plot is somewhat inconclusive, whereas the QQ-plot indicates a value of 
about 0.97 


The QQ-method in practice: 


1. Make a QQ-plot of all the data (empirical vs theoretical quantile) 

2. Choose k based on visual observation of the portion of the graph that looks linear 

3. Compute the slope of the line through the chosen upper k order statistics and the 
corresponding exponential quantiles. 


Alternatively, for Hill and QQ methods: 


1. Plot {(k,a—(k)), 1<k <n} 
2. Look for a stable region of the graph as representing the true value of a~!. 


Using those graphical (supervised) methods to determine wu or, equivalently k, is 
an art as well as a science and the estimate of a is usually rather sensitive to the 
choice of k (but this is the price to pay for having a method to fit the tail without 
using any information before uw). 

Let us turn to another method, which answers this concern and provides an 
automatic (algorithmic) determination of the threshold u (but requiring, in this case, 
the data information before u). 


2.4 A Self-Calibrated Method for Heavy-Tailed Data 


It is based on a paper developed with N. Debbabi and M. Mboup [9]. 

We assume continuous (smooth transitions) and, with no loss of generality, right 
heavy tailed data (a similar treatment being possible on the left tail) belonging to 
the MDA(Fréchet). 

Whereas one of the motivations for this new method is to be able to determine the 
threshold above which we fit the GPD in an unsupervised way, it will also provide 
a good fit for the entire distribution. 

We introduce a hybrid model to fit the whole distribution underlying heavy tailed 
data. The idea is to consider both the mean and tail behaviors, and to use limit 
theorems for each one (as suggested and developed analytically in [21]), in order 
to make the model as general as possible. Therefore, we introduce a Gaussian 
distribution for the mean behavior, justified by the Central Limit Theorem (CLT), 
and a GPD for the tail (justified by the Pickands theorem). Then we bridge the gap 
between mean and asymptotic behaviors by inserting an exponential distribution 
used as a leverage to give full meaning of tail threshold to the junction point between 
the GPD and its exponential neighbour. 

We assume that the hybrid model distribution (which belongs to the Fréchet 
MDA) has a density that is C!. It is the only assumption that is needed (no 
assumption on the dependence of the data). This model, denoted by G-E-GPD 
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(Gaussian-Exponential-Generalized Pareto Distribution), is characterized by its pdf 
h expressed as: 


vi f(x; u,0), if x <4, 
h(x; 6) = 4 yo e(x; A), if uj <x <u, 
y3 g(x — u2; &, B), if xX = U2, 


where f is the Gaussian pdf (wu, 07), e is the exponential pdf with intensity A, g 


is the GPD pdf with tail index € and scaling parameter 6, and y1, y2, v3 are the 
weights (evaluated from the assumption). 


Hybrid probability density function 


0.45 @ Gaussian component 
Exponential component! 


H GPD component 


Combining the facts that we are in the MDA(Fréchet) and that h is a C! pdf gives 
rise to six equations relating all model parameters: 


B=€uy; a= OE, uy =m+ho?; 


5A F(uy; ph, -1 
a yea [peters (14, SEE?) eau] > -¥3 = By2 e(ua; A). 


Y= 72 > 
fly; Ho) f(y; Ho) 


Consequently, the vector of the free parameters is reduced to 6 = [u, 0, 2, €]. 


Remark The main component in this hybrid model is the GPD one (for heavy 
tail), the mean behavior having to be adapted to the context. For instance, for 
insurance claims, we have replaced the Gaussian component with a Lognormal one 
(ognormal-E-GPD hybrid model). 
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2.4.1 Pseudo-code of the Algorithm for the G-E-GPD Parameters 
Estimation 


Here we describe the iterative algorithm, which self-calibrates the G-E-GPD 

model, in particular the tail threshold above which a Fréchet distribution fits 

the extremes. We study its convergence, proving analytically the existence of 

a stationary point, then numerically that the stationary point is attractive and 

unique. 

1: Initialization of pO = = [2 6, ml; 
of € (recall that 6 = [u, 0, U2, €]): 


a, € > 0, and kj»qx, then initialization 


H(y; 6 | p) — 


EO <argmin 
—>0 


where H,, is the empirical cdf of X and y = (y;)1<j<m is a generated sequence 
of synthetic increasing data of size m (that may be different from n), with 
a logarithmic step, in order to increase the number of points above the tail 
threshold uz: y; = min(x;) + (max(x;) — min(x;)) log) (1+ + IG = J). 
l<i<n l<i<n l<i<n —1 
2: Iterative process: 


ek<1 
pas ae 2 
Step 1—Estimation of p™: © — argmin || H(y: 6 |F&») — Hn(y)| 
(w,0)ERxXRA 2 
u2ER, 


~ pan 2 
Step 2—Estimation of &: — F© — argmin | Fo: 6 | p®) — Hn(y)}, 
—>0 


ek<k+i1 
until (aH 0), Hn(y)) <€ and d(H (yq,;0), Hn(4,)) < c) or (k = kmax) 


where ¢ is a positive real that is small enough, yg, represents the observations 
above a fixed high quantile g, of arbitrary order @ > 80% associated with H 
and d(a, b) denotes the distance between a and b, chosen in this study as the 
Mean Squared Error (MSE); it can be interpreted as the Cramér-von-Mises test 
of goodness of fit. 

3: Return 9) = [a®, ok), i, E)]. 


2.4.2 Performance of the Method (Algorithm) Tested via MC Simulations 


To study the performance of the algorithm to self-calibrate the G-E-GPD model, we 
build on MC simulations. To do so, we proceed in four steps: 
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1. Consider 
{Xd = (Os iepenlicgen: training sets of length n 
{Y1 = (Yp)i<p<i}i<q<n: test sets of length / 
with a G-E-GPD parent distribution with a fixed parameters vector 0. 

2. On each training set X7, 1 < q < N, evaluate 67 = [22,0%, 094, Eq] using our 
algorithm 

3. Compute the empirical mean a and variance S,, of estimates of each parameter a 
over the N training sets. To evaluate the performance of the estimator a, we use 
two criteria: 


(i) MSE expressed for any a as: MSE, = ¥ ae 1 (at — a); a small value of 


MSE highlights the reliability of parameters estimation using the algorithm. 
HO0:a=a 

Hl:a#a 

(use for instance the normal test for a large sample) 


(ii) Test on the mean (with unknown variance): 


4. Compare the hybrid pdf h (with the fixed @) with the corresponding estimated 
one h, using 64 on each test set Y7. To do so, compute the average of the log- 
likelihood function Y, over N simulations, between h(Y4; 67) and h(Y7; 6): 
Q= a pe Sy log (h(Y; 6)/ nyé; 67)). The smallest the value of Z is, 
the most trustworthy is the algorithm. 


Several MC simulations have been performed varying @ and n, to test the 
robustness of the algorithm (see [9, $4 and Appendix B]). 


2.4.3 Application in Neuroscience: Neural Data 


We consider the data corresponding to 20s, equivalent to n = 3.10° observations, 
of real extracellular recording of neurons activities. The information to be extracted 
from these data (spikes or action potentials) lies on the extreme behaviors (left and 
right) of the data (Fig. 4). 


Amplitude 
5 
L 


0 
i 


0.0 0.2 0.4 0.6 0.8 1.0 
Time in second 


Fig. 4 One second of neural data, extracellularly recorded 


608 


M. Kratz 


Table 1 Comparison between the self-calibrating method and the three graphical methods: MEP, 


Hill and QQ ones 
| Tail index | Threshold {Distance | Distance 
Model | (&) (u2) Nw | (tail distr.) | (full distr.) 
GPD MEP (PWM): 1.0855 = qoz.6sq | 19,260 | 3.26 x 107° | 
0.3326 
GPD Hill-estimator: 1.0855 = dysei, | 19,260 | 2.07 x 10-6 
| 0.599 
GPD QQ-estimator: 1.0671 = 4g, are, 19,871 | 1.26 x 1075 
| 0.5104 
G-E-GPD | Self-calibrating 1.0301 = doy o% 21,272 |7.79 x 10-6 | 9.31 x 10-5 
| method: 0.5398 


Ny,» represents the number of observations above uz. The distance gives the MSE between the 
empirical (tail or full respectively) distribution and the estimated one from a given model (GPD or 
hybrid G-E-GPD respectively). The neural data sample size is n = 3 x 10° 


Since the neural data can be considered as symmetric, it is sufficient to evaluate 
the right side of the distribution with respect to its mode. 

In Table 1, we present the results obtained with the self-calibrating method, the 
MEP, Hill and QQ methods. Since the three graphical approaches fit only the tail 
distribution, the comparison of the methods will focus on the goodness-of-fit of the 
GPD component. As observed in this table, the MSE between the estimated cdf and 
the empirical one, using only data above the selected threshold, is small enough 
for the four methods ensuring a reliable modeling of extremes. The GPD threshold 
and the estimated tail index are of the same order of magnitude for all methods; it 
confirms that our algorithm works in the right direction. 

We can also notice the good performance of these methods through Fig. 5, where 
we plot the empirical quantile function and the estimated ones using the self- 
calibrating method and the various graphical ones. However, the advantage of our 


Quantile functions 


10 
Ll 


Empirical Zoom 
G-E-GPD 
=) MEP 

== Hill-estimator 
QQ-estimator 


5 
1 


T T T T T 
0.996 0.997 0.998 0.999 1.000 


Quantile (log scale) 


2 
! 


Probability 


Fig. 5 Neural data: comparison between the empirical quantile function and the estimated ones 
via the self-calibrating method and the graphical methods 
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method is that it is unsupervised, i.e. it does not need the intervention of the user to 
select the threshold manually. Moreover it provides a good fit between the hybrid 
cdf estimated on the entire data sample (the right side for this data set) and the 
empirical cdf, with a MSE of order 10-9. 


2.4.4 Application in Finance: S&P 500 Data 


Consider the S&P500 log-returns from January 2, 1987 to February 29, 2016, 
corresponding to n = 7348 observations, available in the tseries package of the R 
programming language. It is well known that log-returns of financial stock indices 
exhibit left and right heavy tails, with a slight different tail index from one to 
the other. It is important in such context to evaluate the nature of tail(s) in order 
to compute the capital needed by a financial institution to cover their risk, often 
expressed as a Value-at-Risk (i.e. a quantile) of high order. 

The S&P500 log-returns being essentially symmetric around zero (representing 
the data mode), we kept the Gaussian component to model the mean behavior when 
applying the self-calibrating method. We modelled the negative log returns and the 
positive ones, respectively, then the full data set. When focusing on tails, we also 
compare our results with those obtained with MEP, Hill, and QQ methods. We 
present them in Tables 2 and 3. We observe that all methods offer a good fit of 
the tail distribution. However, the advantage of the self-calibrating method is that it 
does not need the intervention of the user to select the threshold manually, which is 
a considerable advantage in practice. 

Now, to underline the good performance of the self-calibrating method even in 
the case when data are autocorrelated with a long memory, we apply it on the 
S&P500 absolute log-returns. Indeed, it is well known that the absolute value of 
financial returns are autocorrelated (see Fig. 6), but also that their extremes are not 
(for a thorough discussion of this point and empirical evidences, see (author?) 
[16]). In time of crisis, as e.g. in 2008-2009, we observe an increase of the 
dependence between various financial indices, in particular in the extremes. This is 
to be distinguished from a dependence of the extremes within a univariate financial 
index, which is not observed [16]. 

A comparison of the results obtained with our self-calibrating method and the 
graphical EVT ones is depicted in Table4. In Fig.7, we also give a comparison 
of the estimated quantile function using the G-E-GPD method and the graphical 
(MEP, Hill and QQ) ones. Through Table 4 and Fig. 7, we can highlight once again 
the good performance of the self-calibrating method to estimate the tail distribution 
as well as the entire distribution of autocorrelated data. Note that the estimate of the 
tail index is of the same order as those of the upper and lower tail indices evaluated 
in the previous section. 
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Table 2. Lower tail modeling of the S&P500 log-returns 


Distance Distance 
Model Tail index (€) Threshold (uz) Nuy (tail distr.) (positive distr.) 


GPD MEP: 0.3640 0.0270 = dog x60; 1.19 x 1077 


0.3601 
0.3813 
method: 0.3545 


Comparison between the self-calibrating method and the three graphical methods: MEP, Hill and 
QQ ones, applied on the right side of the S&P500 opposite log-returns (—X). Ny, represents 
the number of observations above the tail threshold u2. The distance gives the MSE between the 
empirical tail (from uz), or positive side (for x > 0) respectively, distribution and the estimated 
one from a given model (GPD, or hybrid G-E-GPD respectively) 


Table 3 Upper tail modeling of the S&P500 log-returns 


Distance Distance 
Model Tail index (€) Threshold (u2) Nuy (tail distr.) (positive distr.) 


GPD MEP: 0.2715 0.0209 = doe so 4.91 x 10-7 


GPD Hill-estimator: 0.0288 = op sacy 4.42 x 1077 
0.3225 
0.2859 
method: 0.3360 


Comparison between the self-calibrating method and the graphical methods applied on the right 
side of the S&P500 log-returns 


ACF 
0.2 04 06 08 


0.0 


Fig. 6 AutoCorrelation function (ACF) of the S&P500 absolute log-returns 
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Table 4 Comparison between the self-calibrating method and the three graphical methods: MEP, 
Hill and QQ ones 


Distance | Distance 

Model Tail index (€) Threshold (uz) Nuvo (tail distr.) (full distr.) 
GPD MEP: 0.3025 0.0282 = dorr1q | 206 | 1.78 x 1077 
‘GPD Hill-estimator: 0.0382 = Gos ase: | 85 | 4.49 x 10-8 

0.3094 
‘GPD QQ-estimator: 0.0323 = dog yy | 137 | 6.01 x 10-8 

0.3288 | | 
G-E-GPD _| Self-calibrating 0.0290 = Gora | 184 | 2.00x 10-7 | 1.05 x 10-5 

method: 0.3331 


The S&P500 absolute log-returns data sample size is n = 7348 


Quantile functions 


— Empirical Zoom 
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if 


Quantile (log scale) 


0.04 
1 
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Fig. 7 S&P 500 absolute log-returns data: comparison between the empirical quantile function 
and the estimated ones via the self-calibrating method and the graphical methods 


3 Dependence 


3.1 Motivation 
3.1.1 Impact of the Dependence on the Diversification Benefit 


The diversification performance is at the heart of the strategy of a company. It can 
be measured, for a portfolio of n risks, via the diversification benefit D,,_ at a 


n 

hare BF; 
threshold aw (0 < a < 1) defined by Dp.g = 1 — Pa(Qjat Li) 
i=l Pa (Li) 
a risk measure. This indicator, not universal as it depends on the number of the 
risks undertaken and on the chosen risk measure p, helps to determine the optimal 
portfolio of the company since the diversification reduces the risk and thus enhances 


the performance. This is key to both insurances and financial institutions. 


, where p denotes 
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Before developing an example in the insurance context (it would be the same for 
investment banks) to point out the impact of the dependence on the diversification 
benefit, let us recall some standard notions in insurance. 


Insurance Framework 


In insurance, the risk is priced based on the knowledge of the loss probability 
distribution. The occurrence of a loss L being random, we define it as a random 
variable (rv) on a probability space (2, .&, P) (note that, in insurance context, 
we often use risk and loss for one another). 

The role of capital for an insurance company is to ensure that the company can 
pay its liability even in the worst cases, up to some threshold. 

It means to define the capital to put behind the risk. That is why we introduce 
a risk measure (say ¢), defined on the loss distribution, in order to estimate the 
capital needed to ensure payment of the claim up to a certain confidence level. 


Now let us define some useful quantities, as: 


Risk-adjusted-capital. The risk can be defined as the deviation from the expec- 
tation, hence the notion of Risk-Adjusted-Capital (RAC), say K, which is a 
function of the risk measure p associated to the risk L, defined by K = 
p(L) — E[L]. 

Risk Loading. An insurance is a company in which shareholders can invest. They 
expect a return on investment. So the insurance firm has to make sure that the 
investors receive their dividends. It corresponds to the cost of capital, 7, that 
the insurance company must charge on its premium. Consider a portfolio of N 


K 
similar policies. The risk loading per policy, say R, is defined as R = n —< = 
p(L)) : . : ; ‘ : 
n eo [L,] ], where Ky is the capital assigned to the entire portfolio, 


LY = aan L; is the total loss of the portfolio, and L; = L is the loss incurred 
by one policy (of the portfolio). 
Technical risk premium. For one policy case, incurring a loss L, the technical 
premium, P, that needs to be paid can be defined by P = E(L) + nK + e, where 
n is the return expected by shareholders before tax, K is the RAC (i.e. the capital 
assigned to this risk), 7K corresponds to the Risk loading (per policy), and e are 
the expenses incurred by the insurer to handle this case. 

Assuming that the expenses are a small portion of the expected loss, i.e. 
e = ak[L] with 0 < a < 1, then the premium can be written as P = 
(1+ a)E[L]+ R. 


Introduction to EVT: Applications to QRM 613 


Generalizing to a portfolio of N similar (iid) policies, the total loss is LW) = 
N 


» L;, hence the premium for one policy in the portfolio becomes: 


i=1 


— +a EL] + 7Ky : Ky 
= = Fa ELL] +9 


P 


where nae is the risk loading per policy. 


Let us then develop our toy model (see [4, 5]) to show the dependence impact on 
the diversification benefit. 

Suppose an insurance company has underwritten N policies of a given risk. To 
price these policies, the company must know the underlying probability distribution 
of this risk. Assume that each policy is exposed n times to this risk, thus in a 
portfolio of N policies, the risk may occur n x WN times. 

Let us introduce a sequence (X;,i = 1,..., Nn) of rv’s X; to model the 
occurrence of the risk, with a given severity / (for simplicity, take it deterministic). 
Hence the total loss amount, say L, associated to this portfolio is given by L = 


Nn 
L\> Xx; :=1SNn. 
i=l 


We are going to consider three models for the occurrence of the risk, depending 
on the dependence structure. 


(a) A First Simple Model, Under the iid Assumption 


Assume the X;’s are iid (independent, identically distributed) with parent rv denoted 
by X, Bernoulli distributed A(p), ic. the loss L; = 1X occurs with some 
probability p: 


__ | 1 with probability p 
~ | 0 with probability 1 — p 


Hence the total loss amount L = / Sy, follows a binomial distribution A(Nn, p). 
We can then deduce the risk loading for an increasing number WN of policies in the 


L 
portfolio: R = n ae —Inp }, in order to determinate the risk premium the 
insurance will ask to a customer if he buys this insurance policy. The relative risk 
Sk R p(L) 
loading per policy is then ———— = —1). 
& per policy IL] 1 ( inp ) 


Numerical application. We choose for instance the number of times one policy 
is exposed to the risk as n = 6 and the unit loss / is fixed to / = 10 Euros. 

Computing the loss distribution, we obtain the results presented in Table 5. We 
observe that the probability that the company will turn out paying more than the 
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Table 5 Distribution of the loss L = / Sj, for one policy (NV = 1) withn = 6 and p = 1/6 


Number of losses 
k 
0 
1 
2 
3 
4 
5 
6 


Table 6 The Risk loading 


per policy as a function of the 


number JN of policies in the 
portfolio (with n = 6) 


Policy loss Probability mass Cdf 

oO 33.490% | 33.490% 
73.678% 
93.71% 
99.130% 
99.934% 
99.998% 
6 =| «0,002% 100.000% 


Risk measure | Number N | with probability 


Risk loading R per policy 


p of policies p=1/4 | p=1/2 
VaR 3.750 | 4.500 
1.650 | 1.800 
1.200 | 1.350 
0.540 | 0.600 
0.375 | 0.420 
0.117 | 0.135 
0.037 | 0.043 
TVaR 3.945 | 4.500 
1.817 | 1.963 
1.330 | 1.482 
0.707 | 0.675 
0.425 | 0.476 
0.134 | 0.154 
0.042 | 0.049 
=[L]/N 15.00 30.00 


expectation E(L) = Inp = 10, is of more than 26%. It makes then clear why the 
technical premium cannot be reduced to E(L). 

Now we compute the risk loading per policy as a function of the number N of 
policies in the portfolio for both risk measures VaR and TVaR, and when taking 
p = 1/6 (fair game), 1/4 and 1/2, respectively. We assume that the cost (of capital) 
is n = 15%, and that the risk measure is computed at threshold a = 99%. Results 


are given in Table 6. 


We observe that, in the case of independent risks, the risk loading R is a 
decreasing function of the number of policies N, even with a biased dice. With 
10,000 policies, R is divided by 100, whatever is the choice of the risk measure p, 


with slightly higher val 


ues for TVaR than VaR. 
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(b) Introducing a Structure of Dependence to Reveal a Systematic Risk 


We introduce two types of structure of dependence between the risks, in order to 
explore the occurrence of a systematic risk and, as a consequence, the limits to 
diversification. 

We still consider the sequence (X;,i = 1,..., Nm) to model the occurrence of 
the risk, with a given severity /, for N policies, but do not assume anymore that the 
Xj;’s are independent (but identically distributed, for sake of simplicity). We assume 
that the occurrence of the risks X;’s depends on another phenomenon, represented 
by a random variable (rv), say U. Depending on the intensity of the phenomenon, 
i.e. the values taken by U, a risk X; has more or less chances to occur. 

Suppose that the dependence between the risks is totally captured by U that is 
identified to the occurrence of a state of systematic risk. Consider, w.l.o.g., that U 


can take two possible values denoted by 1 and 0: U 4 B(p),0 < p << 1, where p 
is chosen very small since we want to explore rare events. We present two examples 
of models (i.e. two types of dependence). 


(i) A dependent model, but conditionally independent 
The occurrence of the risks (X;); is modeled by a Bernoulli rv whose 
parameter is chosen depending on U and such that the conditional rv’s X; | U 
are independent. Since U takes two possible values, the same holds for the 
parameter of the Bernoulli distribution of the conditionally independent rv’s 
X; | U, namely 


Xi|(U=1) © Bq) and X;|(U =0) © Bp) 


where we choose gq >> p, so that whenever U occurs (i.e. U = 1 (crisis 
state)), it has a big impact in the sense that there is a higher chance of loss. 
We include this effect in order to have a systematic risk (non-diversifiable) in 
our portfolio. Hence the mass probability distribution fs of the total amount of 
losses Syn appears as a mixture of two mass probability distributions Sy, and 


FR, of conditional independent rv’s Sy := Syn| (UU = 1) fe B(Nn, q) and 
Sp := Snn| (U = 0) s B(Nn, p), respectively: 


is= Pig = 15: 


Note that p = 0 gives back the normal state. 

Numerical application. As for example (a), we take n = 6 and p = I/n. 
Moreover we choose the loss probability during the crisis to be g = 1/2, and 
explore different probabilities p of occurrence of a crisis. In Table 7, the results 
illustrate well the effect of the non-diversifiable risk. When the probability of 
occurrence of a crisis is high, the diversification does not play a significant role 
anymore already with 100 contracts in the portfolio. For p > 1%, the risk 
loading barely changes when there is a large number of policies (starting at 
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Table 7 The risk loading per policy as a function of the probability of occurrence of a systematic 
risk in the portfolio using VaR and TVaR measures with a = 99% 


Risk loading R 


Risk measure | Number N | In a normal state | With occurrence of a crisis state 


Var 


of policies p =5.0% | p = 10.0% 
5.693 
5 3.450 | 3.900 
10 1.047 3.300 3.450 
50 0.477 3.060 3.030 
100 0.327 2.940 
1000 0.101 2.900 2775 
10,000 0.029 2.866 2.724 


TVaR 1 3.226 3.232 4.711 4.755 5.899 


E[L]/N 10.00 


5 1.707 3.823 4.146 
i0 1.266 3.665 
50 0.760 3.196 3.141 
100 0.596 3.098 3.020 
1000 0.396 2,802 
10,000 0.323 2.732 
0.02 11.00 12.00 


— 


The probability of giving a loss in a state of systematic risk is chosen to be g = 50% 


(ii) 


N = 1000) in the portfolio, for both VaR and TVaR. The non-diversifiable term 
dominates the risk. For lower probability p of occurrence of a crisis, the choice 
of the risk measure matters. For instance, when choosing p = 0.1%, the risk 
loading, compared to the normal state, is multiplied by 10 in the case of TVaR, 
for N = 10,000 policies, and hardly moves in the case of VaR! This effect 
remains, but to a lower extend, when diminishing the number of policies. It is 
clear that the VaR measure does not capture well the crisis state, while TVaR is 
sensitive to the change of state, even with such a small probability and a high 
number of policies. 

A more realistic model setting to introduce a systematic risk 

We adapt further the previous setting to a more realistic description of a 
crisis. At each of the m exposures to the risk, in a state of systematic risk, 
the entire portfolio will be touched by the same increased probability of loss, 
whereas, in a normal state, the entire portfolio will be subject to the same 
equilibrium probability of loss. 

For this modeling, it is more convenient to rewrite the sequence (X;, i = 
1,..., Nn) with a vectorial notation, namely (X;, j = 1,...,) where the 
vector Xj; is defined by X; = (X1j;,...,X nj)". Hence the total loss amount 
Swn can be rewritten as 


n N 
Snn = OL where 5) is the sum of the components of X; : Ke = AG: 
j=l i=l 
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We keep the same notation for the Bernoulli rv U determining the state and 
for its parameter p. But now, instead of defining a normal (U = 0) or a crisis 


(U = 1) state on each element of (X;, i = 1,..., Nn), we do it on each vector 
Xj,l<j<n. 
It comes back to define a sequence of iid rv’s (U;, j = 1,...,) with parent 


rv U. We deduce that S$’ follows a Binomial distribution whose probability 
depends on U;: 


Ken) | (Uj; =1) a B(N,q) and SP | (Uj; =0) 4A BN, p), 


and these conditional rv’s are independent. 
Let us introduce the event A; defined, for] = 0,...,n, as 


A; := {I vectors X; are exposed to a crisis state andn — / to a normal state} 
n 
-(Su=) 
j=l 
whose probability is given by 
” n 
P(A) =P( uj =!)=(") BF a-p_r 
(AD) Xu) (7) ) 


We can then write, with, by conditional independence, 


i 
59 => (8 | uj=1) © BEV, g) 


j=! 
and 
n—l 3 
5e-) => (8 | Uj=0) © BW -D, pd, 
pak 
that 


n 


n n 2 . 2 . x 7 
P(Snn =k) = )~ P(Swn = kIA)P(AD) ->( r) (1— py"! PLS 450 Y= ki]. 
1=0 1=0 


Numerical example revisited: In this case, we cannot directly use an explicit 
expression for the distributions, so we go through Monte-Carlo simulations. 

At each of the n exposures to the risk, first choose between a normal or a crisis 
state. Since, we take here n = 6, the chances of choosing a crisis state when 
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Table 8 The risk loading per policy as a function of the probability of occurrence of a systematic 
risk in the portfolio using VaR and TVaR measures with a = 99% 


Risk loading R 
Risk measure | Number WN | in anormal state | with occurrence of a crisis state 


p of policies p= 5.0% [p= 100% 


i000 013 | 1.295 
10,000 0981 | 1.276 
100,000 1.269 
STEIN 11.00 | 12.00 


The probability of giving a loss in a state of systematic risk is chosen to be g = 50% 


VaR 4350__| 4200 
1.800 
1.500 
1.200 
1.170 
1.186 
1.196 
1.199 
TWaR 4.48 
5 2.226 
i0 1.804 
50 1408 
100 1.358 
“1013 | 
“098 | 


Pp = 0.1% is very small. To get enough of the crisis states, we need to do enough 
simulations, and then average over all the simulations. The results shown in 
Table 8 are obtained with ten million simulations (we ran it also with | and 20 
million simulations to check the convergence). 


The diversification due to the total number of policies is more effective for 
this model than for the previous one, but we still experience a part which is not 
diversifiable. We also computed the case with 100,000 policies (since via Monte 
Carlo simulations). As expected, the risk loading in the normal state continues to 
decrease. In this state, it decreases by 10. However, except for p = 0.1% in 
the VaR case, the decrease becomes very slow when we allow for a crisis state 
to occur. The behavior of this model is more complex than the previous one, 
but more realistic, and we reach also the non-diversifiable part of the risk. For a 
high probability of occurrence of a crisis (1 every 10 years), the limit with VaR is 
reached already at 100 policies, while, with TVaR, it continues to slowly decrease. 
Concerning the choice of risk measure, we see a similar behavior as in the previous 
case for the case N = 10,000 and p = 0.1%: VaR is unable to catch the possible 
occurrence of a crisis state, which shows its limitation as a risk measure. Although 
we know that there is a part of the risk that is non-diversifiable, VaR does not catch 
it really when N = 10,000 or 100,000 while TVaR does not decrease significantly 
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Table 9 Summary of the analytical results (expectation and variance per policy) for the three 
cases of biased games (L = / Syp) 


Case | Expectation Be 1(L) Variance yzvar (L) 


N 


(a) Ing W qd -4@) 


(b)-G) |In(Ba + C=) p) | (qd — OP + PU — py — p)) + Pq - pP BA — B) 


(b)-Gil) | In(q + Gp) pv) |B 


w 


q( — 4)p + pl — p)( — p)) +P @ — pp — Pp 


between 10,000 and 100,000 reflecting the fact that the risk cannot be completely 
diversified away. 


Discussion: Comparison of the Methods 


In Table 9, we see in the first case (a) that the variance decreases with increasing 
N, while both other cases (b) (i and ii) contain a term in the variance that does not 
depend on N. Those two cases are those containing a systematic risk component 
that cannot be diversified. Note that the variance var2(L) of L in the case (b)-(i) 
contains a non-diversified part that corresponds to n times the non-diversified part 
of var3(L) in the case (b)-(ii). 

To conclude, we have seen the effect of diversification on the pricing of 
insurance risk through a simple modeling that allows for a straightforward analytical 
evaluation of the impact of the non-diversified part. In real life, risk takers have to 
pay special attention to the effects that can weaken the diversification benefits, hence 
affect greatly the risk loading of the risk premium (as seen here). Various examples 
can illustrate this situation, as, for instance, for motor insurance, the appearance 
of a hail storm may hit a big number of cars at the same time and thus cannot be 
diversified among the various policies, or for life insurance, pandemic or mortality 
trend would affect the entire portfolio and cannot be diversified away, or the financial 
crisis suddenly increases the dependence between risks (systemic risk). There is a 
saying among traders: “Diversification works the best when you need it the least”. 

Understanding the dependence between risks is crucial for solid risk manage- 
ment. For portfolio management, we need to include both the single risk model and 
the dependence model. 


3.1.2 Type of Dependence 


Consider a portfolio of political risks, the two largest ones being those of China and 
Hong-Kong, with 22.5% linear correlation. A customer asks a reinsurer for a cover 
of those extreme risks, providing him their marginal distributions and the following 
simulations results: 
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X-axis: China; Y-axis: Hong-Kong 
(r = 22.5%) 


10 extreme events 


Applying the reinsurance structure (rectangle) to the customer’s simulations, we 
find ten relevant events in it. However, in this model, the conditional probability 
for Hong-Kong to default on the risk, given that China defaults with probability 
of 1/200 years (i.e. 0.5%), would give a probability less than 5% , which is totally 
unrealistic, given the political situation of dependence of Hong-Kong on China! 

Hence the reinsurer decides to study this portfolio, using the same margins, but 
suggesting a dependence structure via a Clayton copula, calibrating it to have the 
same linear correlation of 22.5%. He obtains a much more realistic conditional 
probability of default of 60%, which gives 21 relevant events in the reinsurance 
structure. Applying simply a non-linear dependence structure increases by a factor 
2 the number of events and by a factor 3 the average loss for the reinsurer. Of course, 
the price of such a cover would be much higher than what the customer expected 
given his model. 


X-axis: China; Y-axis: Hong-Kong 21 extreme events 
(r = 22.5%) 


This example shows that the type of dependence considered for the modeling 
matters a lot when considering the risk! 
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3.2 Notion of Dependence 


How to analyze a phenomenon in view of understanding it better, then modeling it? 
Modeling is a simplification but must not be a reduction! It is the fundamental basis 
of a scientific approach. “Everything should be made as simple as possible, but not 
simpler’ (Saying attributed to Albert Einstein). 

We proceed from simplest tools to more elaborated ones, when needed. In 
terms of dependence, in a multivariate context, it means to look at the rv’s 
from independence to linear dependence to non-linear dependence. Studying the 
dependence between risks is essential for understanding their real impacts and 
consequences. There exists many ways of describing dependence or association 
between rv’s, e.g. linear correlation coefficient, rank correlations (Kendall’s tau, 
Spearman’s rho), ... 

Let us present a brief historical overview. Dependence has always been a topic in 
probability and statistics when looking at what is called a multivariate framework. 
Notions like linear correlation or copula, for instance, were introduced to treat this 
problem. 


— In 1895: Karl Pearson[30] formalized mathematically the notion of linear 
correlation (first introduced by Galton in the context of biometric studies). If 
independence implies linear independence, the converse is false (except in the 
elliptical case), as illustrated on Fig. 8. 

— In 1959: Abe Sklar [35] introduced (in the context of probability theory to solve 
a theoretical problem posed by Fréchet) the more general concept of dependence 
structure, called also copula, separating this structure from the margins. 


y+ y* = 
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e e 
e e 
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e ° 
e e 
© e 
> > 
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e e 
e e 
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. bs Pyy=O =>> Linear 
: td Independence 
e e e 
> > 
x 4 


Fig. 8 Linear versus stochastic independence 
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Fig. 9 Scatterplots of (X1, X2) with normal margins, linear correlation p = 70%, and, respec- 
tively, a Gaussian copula (left plot) and a Gumbel copula (right plot) 


For instance, consider two random vectors having the same standard normal 
margins and a linear correlation of 70%, but a different dependence structure, a 
Gaussian copula and a Gumbel one, respectively. We clearly see in Fig.9 how 
different they are. 

It emphasizes the fact that knowing the marginal distributions and linear cor- 
relation is not enough for determining the joint distribution, except for elliptical 
distributions (as e.g. the Gaussian ones). 


— In 1984: Paul Deheuvels[10] introduced the notion of extreme-value copula. 

— From the 1970s, diverse types of dependence have been studied in mathematical 
statistics and probability. 

— From the twenty-first century, those dependence tools have been introduced in 
the industry: copulas turn out to become an important tool for applications and 
the evaluation of risks in insurance and reinsurance (and later in finance: non- 
linear tools cannot/should not be ignored anymore, especially after the second 
most severe financial crisis starting in 2008) 

— After the 2008 financial crisis, Extreme Value Theory (EVT) finally enters the 
financial world (academics and professionals). The fact that risks are more 
interdependent in extreme situations led to the development of the notion of 
systemic risks, risks that would affect the entire system as well as the notion 
of systematic risks, where components are present in all other risks. 


The world has changed a lot, from the end of the nineteenth to early twentieth 
century, where using the concept of linear correlation (Pearson) was of great help, to 
nowadays, where world is getting more complex, and more and more interconnected 
(see [6] for a discussion of this point). In the next few years, research in statistics 
and probability will have to make significant progress in this area if we want to 
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master the risk at an aggregate level. We have seen that societal demand goes in this 
direction, looking for protection at a global level. 


3.3 Copulas 


Definition 1 A copula is a multivariate distribution function C : [0, 1]¢ — [0,1] 
with standard uniform marginsi.e.C(1,--- ,1,u;,1,---,1) =u;,Vi € {1,..., d}, 
u; € [0, 1]. 


Sklar showed in [35] how a unique copula C fully describes the dependence of 
X proving the following theorem. 


Theorem 4 (Sklar’s Theorem, 1959) Let F be a joint cdf with margins (F;,i = 
1,--- ,d). There exists a copula C such that 


F(x1,-++ ,%@) = C(F\(%1),-+: , Fa(xa)), Vx; € R, i=1,---,d. 


If the margins are continuous then C is unique. 
Conversely, if C is a copula and (F;, 1 <i < d) are univariate cdf, then F 
defined above is a multivariate cdf with margins F,,--- , Fa. 


Proof as an exercise. 
As a consequence, we can give another definition of a copula. 


Definition 2 The copula of (X1,--- , Xq) (or F) is the cdf C of (Fi (X1),--+ , Fa 
(Xa)). 


We sometimes refer to C as the dependence structure of F. 
Here is a useful way to express Sklar’s theorem in dimension 2: 
Theorem 5 (Sklar—dim 2) Let F be a joint cdf with margins (F\, F2). The copula 
C associated to F can be written as 
C(u1, u2) = C(Fi (1), Fo(x2)) 
C(u1, u2) = F(%1, x2) 
C(u1,u2) = F(FP'(u1), Fy 'a)) 


If the margins are continuous then C is unique. 


Copulas satisfy a property of invariance, very useful in practice, and which is not 
satisfied by the linear correlation. 


Property I (Property of Invariance) C is invariant under strictly increasing 
transformations of the marginals. If 7\,---,7y are strictly increasing, then 
(T\(X1), +--+ , Ta(Xq)) has the same copula as (X1,--- , Xq). 
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As for probability distributions, we can define the notion of density function, 
when existing. 


Definition 3 The density function c of a copula C is defined by 


d4C(u1, +++ , ua) 


cC(uy,°++ ,ud) = aus 


The density function of a bivariate distribution can be written in terms of the density 
function c of the associated copula and in terms of the density functions f| and fo 
of the margins: 


fr, 42) = e( Filer), Fo(2)) fie) 22). 


Using this definition, we can prove that: the product copula characterize the inde- 
pendence between two rv. More generally, X1,..., Xq are mutually independent if 
d 


and only if their copula C satisfies C(uw1,...,uq) = I] Uj. 

i=l 
As the linear correlation, which is bounded between —1 and 1, a copula also admits 
bounds, named Fréchet-Hoeffding bounds: 


Tose ee 0) < Clu) < 


min uj = P[U <w,---,U < ug] 
= <i<d 
i= 


1 


where u = (uj,...,Uq) and U is uniformly distributed on [0,1]. 


Scatterplot, Lower Fréchet-Hoeffding bound Scatterplot, Indepedent copula random generation Scatterplot, Upper Fréchet-Hoeffding bound 


The upper Fréchet-Hoeffding bound C,(u1,..., ua) = bd uj is a copula for 
<i< 


any d. It describes the perfect dependence, named also comotonicity: 


X; = 7;(X)), with 7; strictly increasing function, i = 2,...,d <=> C, satisfies 
Cy (U1, siete ud) = min uj. 
l<i<d 


The lower Fréchet-Hoeffding bound C;(u) := max ( a ujtil-d; 0) is 
a copula for d = 2, but not for all d > 2. For d = 2, C; describes the perfect 
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negative dependence, named also countercomotonicity: X2 = 7T(X,), with T 


strictly decreasing function <=> C)(u1, u2) := max(u; + u2 — 1, 0). 


Examples of Copulas Since any type of dependence structure can exist, the same 
can be said about copulas, that is why many new copulas are introduced by 
researchers. Here let us define three standard classes of copulas, already in use 
among practitioners, among which the Extreme Value (EV) copulas (which can 
overlap the two other classes). 


e Elliptical or normal mixture copulas, as for instance: 
— The Gaussian copula (often used in financial modeling); in dimension 2, with 


parameter a € (—1, 1), it is defined via Sklar’s theorem by 


x? —2axy + y? 


1 @(u) pb (v) 
cu.) = | / ex {SES | av 
es. soy. 2 — a) ‘ 


where @ denotes the standard normal distribution. 

— The Student-t copula is the distribution of (T(X;),i = 1,--- ,d) where T is 
the t-cdf and (X;,i = 1,..., d) has a joint t-distribution. For d = 2, it can be 
expressed (via Sklar’s theorem) as: 


1 ttm) pty!) 42 2oxy + y? —(v+2)/2 
Ctu, v) = = / / (1 + | dxdy 
WnV 1 — a2 J—oo —0o 20 — a) 
fora € (—1, 1) and degrees of freedom v > 2. 
e Archimedean copulas. 


Definition. An Archimedean copula C is defined by 


C(ui,..., ua) = W'(W(ur) +--+ w(ua)) 
where Ww :]0,1] — [0, 00) is continuous, strictly decreasing, convex, and 
satisfies w(1) = O and _ w(t) = +00; set w(t) = Oif WO) <t<+a. 
t> 


We call wy the strict generator of C. 
Archimedean copulas are exchangeable, i.e. invariant under permutation. 
Examples: 


— Gumbel copula, defined in dimension 2 by: 
CS" (u, v) = exp{ — ((—logu)* + (log v)*)/"}, with B > 1. 


When 6 = 1:C Me (u, v) = uv pointing out the independence of the variables. 
When f — ov, the variables tend to be comonotonic: Coe (u, v) = min(u, v). 
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— Clayton copula, defined in dimension 2 by: 
cS! _ (,,-B =F 2g —1/B ith 
B u,v) = (u +u ) , with B > 0. 


We have lim CS! (u, v) = uv, independence case, and lim COU, vy) = 
B 00 
BNO B00 
min(u, v), comonotonic case. 


¢ Extreme Value (EV) Copulas 
A copula C is said to be an Extreme Value (EV) copula if it satisfies the max- 
stability characteristic property: 


Vy >0, C”(uy,...,uq)= Cats 3 04): 


An alternative definition is the following, that we state e.g. in dimension 2: 


logu 
C(u, v) = exp } (logu + log v) A {| ——————— 
logu + logv 


where A, called the dependence (or Pickands) function, is convex on [0, 1] and 
satisfies A(O) = A(1) = | and max(1 — w, w) < A(@) < 1, Vw € [0, 1]. 
The function A can be defined from the EV copula C by setting 


A(w) =—InC(e”,e"™), we [0,1]. 


Bounds have also been provided for A. If the upper bound is reached, i.e. if 
A(w) = 1, Vw, then C is the independence copula. If the lower bound is reached, 
i.e. if A(w) = max(w, | — w), then it is a comonotonicity copula. 

Examples of EV copulas: independence copula, comonotonicity copula, 
Gumbel copula (it is a parametric EV copula; the Gumbel copula model is 
sometimes known as the logistic model), Galambos copula (as defined below). 

Let us consider the dependence function A, introduced by Galambos and 
defined, for 0 < w < 1, withO <a, 6 < 1 andé@ > 0, by 


_»\—-1/0 
A(w) = 1 — ((@w)~? + (BU — w)) *) 


We can check that A is a convex function having the right bounds for the 
definition of an EV copula, so that we can create an EV copula from A. We 
obtain the bivariate EV copula, named Galambos copula (this copula model is 
sometimes known as the negative logistic model), 


Coapes v) =u vexp|((-a Inu)~° + (—B In jy", 


It is represented in Fig. 10 [27, p. 313]. 
To learn more on copulas, refer e.g. to [28] and [20]. 
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Fig. 10 Plot of dependence function for (a) the symmetric Galambos (a = £ = 1), spanning the 
whole range from independence to comonotonicity, and (b) the asymmetric Galambos copula with 
a = 0.9 and 6 = 0.8; the limit as 6 — 0 is the independence model, whereas as 6 > o, it is 
no longer the comonotonicity model. Dashed lines show boundaries of the triangle in which the 
dependence function must reside; solid lines show dependence functions for a range of 0 values 
running from 0.2 to 5 in steps of size 0.1 


3.4 Notion of Rank Correlation 


Let us introduce two rank correlations, the Spearman’s rho ps and the Kendall’s tau 
fr, Which can also be expressed in terms of copulas (see e.g. [27, §5.2]). 

Let C denote the copula of (X;, X2), and p the Pearson (linear) correlation of 
X, and Xp. 


¢ The Spearman’s rho ps is defined by 
ps(X1, X2) = p(Fi(X1), F2(X2)) = p(copula) 
and also by 
1 pl 
ps(X1, X2) = 12 f if (C(w1, v2) — uyur)duydur. 
0 JO 
¢ The Kendall’s tau p, is defined by 
pr (X1, X2) = 2P[(X1 — X1)(X2 — X2) > 0] - 1 
with (X 1, Xr) an independent copy of (X1, X2), and also by 
1 pl 
pe(Xi.Xa)=4 ff conway tuys) = 1 
0 JO 
Case of elliptical models: Suppose X = (Xj, X2) has any elliptical distri- 


bution (e.g. X has a Student distribution t(v, uw, )). Then pr(X1, X2) = 
2 

— arcsin (o(X1, X»)). 

a 
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Note that if X; has infinite variance, then o(X,, X2) can be interpreted as 
T12 
Ji 12,2° 


3.4.1 Properties of Rank Correlations 


We can enunciate the following properties for the Spearman’s rho ps. The same 
holds true for Kendall’s tau p,. But those properties are not shared by the linear 
correlation. 


1. ps depends only on the copula of (X1, X2); 

2. ps is invariant under strictly increasing transformations of the rv’s; 
3. ps(X1, X2) = 1 S C(X1, X2) is comonotonic; 

4. ps(X1, X2) = -—1 S& C(X1, X2) is countermonotonic. 


3.5. Ranked Scatterplots 


Let us draw ranked scatterplots with different copulas. 

First we consider archimedean copulas (with parameter 0), namely Clayton, 
Clayton-mirror (i.e. when we flip it) and Gumbel, and elliptical ones, namely 
Student copula with v = 1 and Gaussian one. For all of them, we choose the 
Kendall’s tau pz = 50%. 


Clayton 


Gumbel Student v=1 


og Oe 


’ y, 


Then we consider the same margins and play with the parameters of the copulas 
to see their impact. 
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Gaussian copula with p = 0,30,60,90 % from left to right 


3.6 Other Type of Dependence: Tail or Extremal Dependence 


The objective is to measure the dependence in joint tail of bivariate distribution. Let 
C denote the copula of the random vector (X1, X2). 


* Coefficient of upper tail dependence. When the limit exists, it is defined as 


Au(X1, X2) = lim P[X2 > VaRg(X2)| X1 > VaRa(X1)] 
a> 
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and, as function of the copula C, 


1 —2a+ C(a, a) 


Au(X1, X2) = lim 
al l-a 


Coefficient of lower tail dependence. When the limit exists, it is defined as 
Ay(X1, X2) = lim P[X2 < VaRy(X2) | X1 < VaRa(X1)| 
a—>0 


and, as function of C, 


C(a, a) 


Ai(X1, X2) = lim 
a—>0 a 


3.6.1 Properties and Terminology 


1. 


Au € [0, 1] and A; € [0, 1]; 


2. For elliptical copulas, 7, = 4; := A. Note that this is true for all copulas with 


radial symmetry, i.e. such that (U;, U2) =? (1 —U,,1—Un); 


. IfA, € (0, 1], then there exits an upper tail dependence and if A; € (0, 1], there 


exits a lower tail dependence; 
Ay = O means that there is asymptotic independence in the upper tail and A; = 0 
means that there is asymptotic independence in lower tail. 


Examples 


1. 


2. 


We can prove that a Gaussian copula with parameter p is asymptotically 
independent (i.e. A = 0) whenever |p| < 1; 

A t-copula with parameter p is tail dependent whenever p > —1, whatever is 
the number of degrees of freedom v. Its coefficient of (lower and upper) tail 


dependence is given by: A = 2f)41 (7 aE <4): 


. The Gumbel copula with parameter 6 is upper oe dependent for B > 1, and this 


upper tail dependence is measured by A, = 2 — oN 


. The Clayton copula with parameter £ is lower tail dependent for B > O, and 


N= 2-1/8, 


The properties of symmetric tail dependence, as well as of asymptotic tail 


dependence for the Gaussian copula and upper tail dependence for the Student 
copula, are well illustrated in Fig. 11. 
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X2 
0 


Fig. 11 Gaussian (left) and student £3 (right) copulas with same margins and parameter p = 70%. 
Quantiles lines are given for 0.5% and 99.5% 


Table 10 Left table: joint tail probabilities P[X; > VaRy(X 1), X2 > VaRy(X2)] fora = 
95, 99, 99.5, 99.9%, respectively. Right table: Joint tail probabilities PLX; > VaRoow(Xi),i = 


1,...,d] for d = 2, 3, 4, 5 respectively, when taking equal correlations 
p C Quantile p C Dimension d 

95% 99% 99.5% 99.9% 2 3 4 5 
0.5 N{121x 107? 1.29x10-% 4.96 x 10-7 5.42x 107° 0.5 N ]1.29x 10-% 3.66 x 10-4 1.49 x 10-4 7.48 x 10-° 
0.5 +8 1.20 1.65 1.94 3.01 0.5 +8 1.65 2.36 3.09 3.82 
0.5 t4 1.39 2.22 2.79 4.86 0.5 t4 2.22 3.82 5.66 7.68 
05 +3 1.50 2.55 3.26 5.83 0.5 3 2.55 4.72 7.35 10.34 
07 N/195x1l0 267x110 114x10-* 160x10% 07 N /267x10-% 1.28x10-% 7.77x10-? 535x107 
07 +8 1a 1.33 1.46 1.86 07 +8 1.33 1.58 1.78 1.95 
07 t4 1.21 1.60 1.82 2.52 07 t4 1.60 2.10 2.53 2.91 
07 +3 1.27 1.74 2.01 2.83 0.7 +3 1.74 2.39 2.97 3.45 


For both tables: The copula C of the random vector is either Gaussian (denoted by N) or Student 
t with three possible degrees of freedom v = 8, 4, 3 (the smaller is v, the heavier is the tail) and 
parameter p = 50% or 70%. Note that for the Student cases, only the factor by which Gaussian 
(N) joint tail probability must be multiplied, is given 


3.6.2. Numerical Example Showing the Impact of the Choice of Copula 


We already provided in Sect.3.1.2 an example with political risks where we 
observed how much the choice of the dependence structure would impact the results. 
In that example we considered a Gaussian dependence versus a Clayton one. 

Let us give another example where we compare in Table 10 the joint tail 
probability at finite levels of two copulas which are both elliptical. This is an 
example developed by McNeil et al. in [27]. 

Let us illustrate those results, giving the financial interpretation suggested in [27]. 

Consider daily returns on five financial instruments and suppose that we believe 
that all correlations between returns are equal to 50%. However, we are unsure 
about the best multivariate model for these data. On one hand, if returns follow 
a multivariate Gaussian distribution then the probability that on any day all returns 
fall below their 1% quantiles is 7.48 x 1075. In the long run such an event will 
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happen once every 13,369 trading days on average, that is roughly once every 
51.4 years (assuming 260 trading days in a year). On the other hand, if returns follow 
a multivariate ¢ distribution with four degrees of freedom then such an event will 
happen 7.68 times more often, that is roughly once every 6.7 years, which would 
induce a very different behavior in terms of risk management! During the subprime 
crisis, this was the problem of the too high rating given to the CDOs (Collateralized 
Debt Obligation) by the rating agencies, who only considered linear correlation for 
the dependence between the risks. 


4 Multivariate EVT 


Let us end those notes by giving a brief idea about the basis on which EVT has been 
extended in the multivariate setting. It is a research domain which has aroused an 
increasing interest this past decade, in particular due to its practical use. 


4.1 MEV Distribution 


Some Notation Let X1,---,Xj,--- ,Xy be iid random vectors in R¢, each X; 
(i = 1,--- , n) having its components denoted by X;;, j = 1,--- , d; they could be 
interpreted as losses of d different types. Let F be the joint cdf of any random vector 
X; and F\,--- , Fg be its marginal cdf’s. Let M,; = max X;;, for j = 1,--- ,d; 


l<i<n 
it is the maximum of the jth component, and M,, be the d-random vector the vector 
of componentwise block maxima, i.e. with components M,;, j = 1,--- ,d. 


The main question that might be asked, when going from univariate EVT to 
multivariate one, is which underlying multivariate cdf’s F are attracted to which 
MEV distributions H? 


Definition 4 If there exist vectors of normalizing constants (of dimension d) c, > 0 
and d, such that (M,, — d,)/cy, converges in distribution to a random vector with 
joint (non-degenerated) cdf H, i.e. 


Mn - dn n d 
P | — <x|]=F"(cqyx+d,) — A(x), x ER’, 
Cn n—>oo 
we say that F is in the Maximum Domain of Attraction of H, written F € 
M DA(A), and we refer to H as a MEV (Multivariate Extreme Value) distribution. 


If H has non-degenerate margins, then 


— these margins are univariate EV distributions of one of the three types, by 
application of univariate EVT; 
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— via Sklar’s theorem, H has a copula, which is unique if the margins are 
continuous. 


Theorem 6 [f F ¢ MDA(A) for some F and H with GEV margins, then the 
unique copula C of H satisfies the scaling property: 


C’(u) =C(u’), Vu Ee R?, Vy > 0, 


which means that C is an extreme value (EV) copula (as defined previously); it can 
then be the copula of a MEV distribution. 


4.2 Copula Domain of Attraction 


We can enunciate the following asymptotic theorem. 


Theorem 7 ([14]) Let F;,i = 1,--- ,d, be some continuous marginals cdf’s and 
C some copula. Let define F(x) = C(Fi (x1), -°° , Fa(xa)) and let H(x) = 
Co (Ay (x1),°°°, Ha(xa)) be a MEV distribution with EV copula Co. Then we have 


F,; € MDA(Aj) fori = 1,--- , d, 
F € MDA(A) if and only if } and 
lim C'(u'/') =Co@), ue [0, 11%. 
t—oo 
Notice that: 


— the marginal distributions of F determine the margins of the MEV limit but are 
irrelevant to the determination of its dependence structure; 

— the copula Co of the limiting MEV distribution is determined solely by the copula 
C of the underlying distribution. 


Definition 5 If jim Ca) = Co(u), u € [0, 1]%, for some C and some EV 
—>0o 


copula Co, then we say that C belongs to the copula domain of attraction of Co: 
C € CDA(Co). 


4.2.1 Upper Tail Dependence and CDA 


Proposition 1 Let C be a bivariate copula with upper tail-dependence coefficient 
Ay. Assume that C € MDA(Co) for some EV copula Co with Pickands (depen- 
dence) function A. Then i,, is also the upper tail-dependence coefficient of Co and 
is related to its dependence function by hy, = 2(1 — A(1/2)). 
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Proof First, let us prove that C and Co have the same A,,. To do so, we just need to 
1—C(a, a) i 1 — Co(a@, a) 
Ss Sir SEXO 


check that lim . We have, using the definition of 


a1 —a asl l-a 
C € CDA(Co), 
1 — Co(@, a) . log Co(a, a) log (¢[1 —C(al/!, a!/t)]) 
1 = lim = lin km ——_———— 
a> l-a a>l l-a a—|t0o l-a 
ee ge DOO) a LS Clea) v, A= CBP) 
= lm lim = —m—— = lim lim ———— = lim ————. 
a>ls+0+ —slog(a) a>ls+0+ —log(a’) pol- 1-6 
hence the result. The converse is straightforward. Oo 


Consequence: A, = 0 => A(/2) = 1 => A = 1 (since A convex function) > 
Co is the independence copula. 


5 Conclusion 


In these notes, we have explored the EVT both in the univariate as well as in the 
multivariate case by looking at dependence between rv’s. We have seen that there are 
mature methods for determining accurately the shape of the tail of the distribution. 
There are also methods to backtest statistically if the model captures it correctly. 
This can be done when choosing Expected Shortfall as a risk measure (see e.g. [23] 
and references therein). We pointed out through examples the importance for good 
risk management of accounting for extreme risks, but also of correctly modeling the 
non-linear dependence when present in the data or in the process to be studied. We 
hope to have shown that there is no excuse anymore to ignore EVT in quantitative 
risk management. 
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Monotone Sharpe Ratios and Related M®) 
Measures of Investment Performance Ghost for 


Mikhail Zhithukhin 


Abstract We introduce a new measure of performance of investment strategies, the 
monotone Sharpe ratio. We study its properties, establish a connection with coherent 
risk measures, and obtain an efficient representation for using in applications. 


1 Introduction 


This paper concerns the problem of evaluation of performance of investment 
strategies. By performance, in a broad sense, we mean a numerical quantity which 
characterizes how good the return rate of a strategy is, so that an investor typically 
wants to find a strategy with high performance. 

Apparently, the most well-known performance measure is the Sharpe ratio, the 
ratio of the expectation of a future return, adjusted by a risk-free rate or another 
benchmark, to its standard deviation. It was introduced by William F. Sharpe in 
the 1966 paper [23], a more modern look can be also found in [24]. The Sharpe 
ratio is based on the Markowitz mean-variance paradigm [14], which assumes that 
investors need to care only about the mean rate of return of assets and the variance 
of the rate of return: then in order to find an investment strategy with the smallest 
risk (identified with the variance of return) for a given desired expected return, one 
just needs to find a strategy with the best Sharpe ratio and diversify appropriately 
between this strategy and the risk-free asset (see a brief review in Sect.2 below). 
Despite its simplicity, as viewed from today’s economic science, the Markowitz 
portfolio theory was a major breakthrough in mathematical finance. Even today, 
more than 65 years later, analysts still routinely compute Sharpe ratios of investment 
portfolios and use it, among other tools, to evaluate performance. 

In the present paper we look at this theory in a new way, and establish 
connections with much more recent developments. The main part of the material 
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of the paper developed from a well-known observation that variance is not identical 
to risk: roughly speaking, one has to distinguish between “variance above mean” 
(which is good) and “variance below mean” (which is bad). In particular, the Sharpe 
ratio lacks the property of monotonicity, i.e. there might exist an investment strategy 
which always yields a return higher than another strategy, but has a smaller Sharpe 
ratio. The original goal of this work was to study a modification of the Sharpe ratio, 
which makes it monotone. Some preliminary results were presented in [27, 28]. It 
turned out, that the modified Sharpe ratio posses interesting properties and is tightly 
connected to the theory of risk measures. The study of them is the subject of this 
paper. 

The modification of the Sharpe ratio we consider, which we call the monotone 
Sharpe ratio, is defined as the maximum of the Sharpe ratios of all probability 
distributions that are dominated by the distribution of the return of some given 
investment strategy. In this paper we work only with ex ante performance measure, 
i.e. assume that probability distributions of returns are known or can be modeled, 
and one needs to evaluate their performance; we leave aside the question how to 
construct appropriate models and calibrate them from data. 

The theory we develop focuses on two aspects: on one hand, to place the new 
performance measure on a modern theoretical foundation, and, on the other hand, 
take into account issues arising in applications, like a possibility of fast computation 
and good properties of numerical results. Regarding the former aspect, we can 
mention the paper of Cherny and Madan [3], who studied performance measures by 
an axiomatic approach. The abstract theory of performance measures they proposed 
is tightly related to the theory of convex and coherent risk measures, which has been 
a major breakthrough in the mathematical finance in the past two decades. We show 
that the monotone Sharpe ratio satisfies those axioms, which allows to apply results 
from the risk measures theory to it through the framework of Cherny and Madan. 
Also we establish a connection with more recently developed objects, the so- 
called buffered probabilities, first introduced by Rockafellar and Royset in [21] and 
now gaining popularity in applications involving optimization under uncertainty. 
Roughly speaking, they are “nice” alternatives to optimization criteria involving 
probabilities of adverse events, and lead to solutions of optimization problems which 
have better mathematical properties compared to those when standard probabilities 
are used. One of main implications of our results is that the portfolio selection 
problem with the monotone Sharpe ratio is equivalent to minimization of the 
buffered probability of loss. 

Addressing the second aspect mentioned above, our main result here is a repre- 
sentation of the monotone Sharpe ratio as a solution of some convex optimization 
problem, which gives a computationally efficient way to evaluate it. Representations 
of various functionals in such a way are well-known in the literature on convex 
optimization. For example, in the context of finance, we can mention the famous 
result of Rockafellar and Uryasev [22] about the representation of the conditional 
value at risk. That paper also provides a good explanation why such a representation 
is useful in applications (we also give a brief account on that below). 
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Our representation also turns out to be useful in stochastic control problems 
related to maximization of the Sharpe ratio in dynamic trading. Those problems 
are known in the literature as examples of stochastic control problems where the 
Bellman optimality principle cannot be directly applied. With our theory, we are 
able to find the optimal strategies in a shorter and simpler way, compared to the 
results previously known in the literature. 

Finally, we would like to mention, that in the literature a large number of 
performance measures have been studied. See for example papers [4, 5, 10] 
providing more than a hundred examples of them addressing various aspects of 
evaluation of quality of investment strategies. We believe that due to both the 
theoretical foundation and the convenience for applications, the monotone Sharpe 
ratio is a valuable contribution to the field. 

The paper is organized as follows. In Sect. 2 we introduce the monotone Sharpe 
ratio and study its basic properties which make it a reasonable performance measure. 
There we also prove one of the central results, the representation as a solution of a 
convex optimization problem. In Sect. 3, we generalize the concept of the buffered 
probability and establish a connection with the monotone Sharpe ratio, as well 
as show how it can be used in portfolio selection problems. Section 4 contains 
applications to dynamic problems. 


2 The Monotone Sharpe Ratio 


2.1. Introduction: Markowitz Portfolio Optimization 
and the Sharpe Ratio 


Consider a one-period market model, where an investor wants to distribute her initial 
capital between n + | assets: one riskless asset and n risky assets. Assume that the 
risky assets yield return Rj, i = 1,...,n, so that $1 invested “today” in asset i 
turns into $(1 + R;) “tomorrow”; the rates of return R; are random variables with 
known distributions, such that R; > —1 with probability 1 (no bankrupts happen). 
The rate of return of the riskless asset is constant, Ro = r > —1. We always assume 
that the probability distributions of R; are known and given, and, for example, do 
not consider the question how to estimate them from past data. In other words, we 
always work with ex ante performance measures (see [24]). 

An investment portfolio of the investor is identified with a vector x € R"*!, 
where x; is the proportion of the initial capital invested in asset 7. In particular, 
>>; Xi = 1. Some coordinates x; may be negative, which is interpreted as short sales 
(i = 1,...,n) or loans (i = 0). It is easy to see that the total return of the portfolio 
1S Ry => (x, R) = >; x; Rj. 

The Markowitz model prescribes the investor to choose the optimal investment 
portfolio in the following way: she should decide what expected return ER, she 
wants to achieve, and then find the portfolio x which minimizes the variance of the 
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return Var R,. This leads to the quadratic optimization problem: 


minimize Var Ry over x € Rt! 
subjectto ER, = (1) 


4 = 1. 


Under mild conditions on the joint distribution of R;, there exists a unique solution 
x*, which can be easily written explicitly in terms of the covariance matrix and the 
vector of expected returns of R; (the formula can be found in any textbook on the 
subject, see, for example, Chapter 2.4 in [18]). 

It turns out that points (oy*, Wx*), where ox» = /Var Ry, Ux, = ER, 
correspond to the optimal portfolios for all possible expected returns jz € [r, 00), lie 
on the straight line in the plane (o, jz), called the efficient frontier. This is the set of 
portfolios the investor should choose from—any portfolio below this line is inferior 
to some efficient portfolio (i.e. has the same expected return but larger variance), 
and there are no portfolios above the efficient frontier. 

The slope of the efficient frontier is equal to the Sharpe ratio of any efficient 
portfolio containing a non-zero amount of risky assets (those portfolios have the 
same Sharpe ratio). Recall that the Sharpe ratio of return R is defined as the ratio of 
the expected return adjusted by the risk-free rate to its standard deviation 


E(R —1r) 
Var R 


In particular, to solve problem (1), it is enough to find some efficient portfolio x, and 
then any other efficient portfolio can be constructed by a combination of the riskless 
portfolio xo = (1, 0,..., 0) and x, ie x* = (1 —A)xo + AX, where 2 € [0, +00) is 
chosen to satisfy ER,» = ju > r. This is basically the statement of the Mutual Fund 
Theorem. Thus, the Sharpe ratio can be considered as a measure of performance of 
an investment portfolio and an investor is interested in finding a portfolio with the 
highest performance. In practice, broad market indices can be considered as quite 
close to efficient portfolios. 

The main part of the material in this paper grew from the observation that 
the Sharpe ratio is not monotone: for two random variables X, Y the inequality 
X < Y ass. does not imply the same inequality between their Sharpe ratios, i.e. 
that S(X) < S(Y). Here is an example: let X have the normal distribution with 
mean | and variance 1 and Y = X A 1; obviously, $(X) = 1 but one can compute 
that S(Y) > 1. From the point of view of the portfolio selection problem, this fact 
means that it is possible to increase the Sharpe ratio by disposing part of the return 
(or consuming it). This doesn’t agree well with the common sense interpretation of 
efficiency. Therefore, one may want to look for a replacement of the Sharpe ratio, 
which will not have such a non-natural property. 


S(R) = 
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In this paper we’ll use the following simple idea: if it is possible to increase the 
Sharpe ratio by disposing a part of the return, let’s define the new performance 
measure as the maximum Sharpe ratio that can be achieve by such a disposal. 
Namely, define the new functional by 


S(X) = sup S(X — C), 
C20 


where the supremum is over all non-negative random variables C (defined on the 
same probability space as X), which represent the disposed return. In the rest of this 
section, we’ll study such functionals and how they can be used in portfolio selection 
problems. We’ll work in a more general setting and consider not only the ratio of 
expected return to standard deviation of return but also ratios of expected return to 
deviations in L?. The corresponding definitions will be given below. 


2.2. The Definition of the Monotone Sharpe Ratio 
and Its Representation 


In this section we’ll treat random variables as returns of some investment strategies, 
unless other is stated. That is, large values are good, small values are bad. Without 
loss of generality, we'll assume that the risk-free rate is zero, otherwise one can 
replace a return X with X — r, and all the results will remain valid. 

First we give the definition of a deviation measure in L”, p € [1, c©), which will 
be used in the denominator of the Sharpe ratio instead of the standard deviation (the 
latter one is a particular case for p = 2). Everywhere below, || - || » denotes the norm 


1 
in LP, ie. ||X|lp = (E|X|?)?. 


Definition 1 We define the L?-deviation of a random variable X € L? as 


o,(X) = min ||X — cll p. 
p(X) = min |X —cllp 


In the particular case p = 2, as is well-known, 02(X) is the standard deviation, and 
the minimizer is c* = EX. For p = 1, the minimizer c* = med(X), the median 
of the distribution of X, so that 0; (X) is the absolute deviation from the median. It 
is possible to use other deviation measures to define the monotone Sharpe ratio, for 
example || X —EX'||p, but the definition given above seems to be the most convenient 
for our purposes. 

Observe that 0, obviously satisfies the following properties, which will be used 
later: (a) it is sublinear; (b) it is uniformly continuous on L?; (c) op(X) = 0 if and 
only if X is a constant a.s.; (d) for any o-algebraY C ¥, where F is the original o- 
algebra on the underlying probability space for X, we have op(E(X | Y)) < o)(X); 
(e) if X and Y have the same distributions, then o,(X) = op(Y). 


642 M. Zhitlukhin 


Definition 2 The monotone Sharpe ratio in L? of a random variable X € L? is 
defined by 


Sp(X) = sup (2) 


y<x Op/(Y) 


where the supremum is over all Y € L? such that Y < X a.s. For X = Oa.s. we set 
by definition S,(0) = 0. 


One can easily see that if p > 1, then S,(X) assumes value in [0, oo]. Indeed, if 
EX < 0, then S,(X) = 0 as it is possible to take Y < X with arbitrarily large L?- 
deviation keeping EY bounded. On the other hand, if X > 0a.s. and P(X > 0) £0, 
then Sp(X) = +00 as one can consider Y, = eI(X > e€) with e — 0 for which 
EY, /op(Ye) > o. 

Thus, the main case of interest will be when EX > 0 and P(X < 0) $ 0; then 
0 < Sp(X) < oe. For this case, the following theorem provides the representation 
of Sp as a solution of some convex optimization problem. 


Theorem 1 Suppose X € L? and E(X) > 0, P(X < 0) ¥ 0. Then the following 
representations of the monotone Sharpe ratio are valid. 


1) For p € (1, 00) with q such that + + 7 = 1s 
(S)(X))4 = max [b — ( |(aX +b)4—q|? + (aX + b)s)}. G3) 
2) For p=1,2: 


1 
itS&anP = = min EC —cx)h. (4) 


The main point about this theorem is that it allows to reduce the problem of 
computing Sp as the supremum over the set of random variables to the optimization 
problem with one or two real parameters and the convex objective function. The 
latter problem is much easier than the former one, since there exist efficient 
algorithms of numerical convex optimization. This gives a convenient way to 
compute Sp)(X) (though only numerically, unlike the standard Sharpe ratio). We’ll 
also see that the representation is useful for establishing some theoretical results 
about Sp. 

For the proof, we need the following auxiliary lemma. 


Lemma 1 Suppose X € L?, p € [1, 0), and q is such that + ey - = 1. Then 


op(X) = max{E(RX) | Re L’, ER=0, [Rll < 1}. 
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Proof Suppose op(X) = ||X — c*||p. By Hélder’s inequality, for any R € L’ with 
ER = Oand ||R||z < 1 we have 


E(RX) = E(R(X —c*)) < ||Rllq IX — ec" llp < 1X — ce" llp. 
On the other hand, the two inequalities turn into equalities for 


_ sgn(X — eye oe 


R* 
p=l 
|X —c*llp 


and R* satisfies the above constraints. 


Proof (Proof of Theorem 1) Without loss of generality, assume EX = 1. First 
we’re going to show that Sp can be represented through the following optimization 
problem: 


Sp(X) = Ant (IR llg |R<las., ER=0, E(RX) = I}. (5) 
Ee 
In (2), introduce the new variables: c = (EY)~! € Rand Z = cY € L?. Then 


1 
—— = inf Z) |Z <cX, EZ =}. 
Sy) wae ees 
ceR 
Consider the dual of the optimization problem in the RHS (see the Appendix for 
a brief overview of duality methods in optimization). Define the dual objective 
function g: L4 x R-> Rby 


g(u,v) = aus {op(Z) + Eu(Z — cX)) — v(BZ — 1)}. 
EL? 
ceR 


The dual problem consists in maximizing g(u, v) over all u €¢ L4,, v € R. We want 
to show that the strong duality takes place, i.e. that the values of the primal and the 
dual problems are equal: 


= sup g(u, v). 
ueL4 
veR 


Sp(X) 
To verify the sufficient condition for the strong duality from Theorem 7, introduce 
the optimal value function @: L? x R > [—ov, oo) 


o(a,b) = inf {o)(Z) | Z—cX <a, EZ—-1=5} 
a 
ce 
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(obviously, S69) = (0,0)). Observe that if a pair (Z1,c1) satisfies the 
constraints in @(a1, b;) then the pair (Z2, c2) with 


c2=cy+bz—b) + Ela; — a2), 22 = Z|) +a2—a, + (cz — €1)X, 


satisfies the constraints in $ (a2, b2). Clearly, ||Z1 — Z2||p + |e1 — c2| = O(|la1 — 
a2\| p + |b1 — b2|), which implies that (a, b) is continuous, so the strong duality 
holds. 

Let us now transform the dual problem. It is obvious that if E(uX) # 0, then 
g(u,v) = —oo (minimize over c). For u such that E(uX) = 0, using the dual 
representation of o)(X), we can write 


g(u,v) = inf sup E(Z(R+u—v)+ 0) if Eux) = 0, 
ZEL? RER 


where # = {R € L1: ER=0, ||Rllq < 1} is the dual set for 0, from Lemma 1. 
Observe that the set Z is compact in the weak-x* topology by the Banach-Alaoglu 
theorem. Consequently, by the minimax theorem (see Theorem 8), the supremum 
and infimum can be swapped. Then it is easy to see that g(u, v) > —oo only if there 
exists R € & such that R+u—v = 0Oa.s., and in this case g(u, v) = v. Therefore, 
the dual problem can be written as follows: 


1 
—— = sup{v|u>0as., Eux)=0, v-ue &} 
SpO0 wets 
ve 


sup {E(RX) | R < E(RX) as.} 
RE& 


sup {E(RX) | R < E(RX)as., ER=0, |Rllg < U), 
ReL¢ 


where in the second equality we used that if v—u = R € &, then the second 

constraint imply that v = E(RX) since it is assumed that EX = 1. Now by changing 

the variable R to R/E(RX) in the right-hand side, we obtain representation (5). 
From (5), it is obvious that for p > 1 


(Sp(X))? = int {E|R|2 | R<1las., ER=0, E(RX) = 1}. (6) 
eEL7 


We’ll now consider the optimization problem dual to this one. Denote its optimal 
value function by ¢: LY x R x R > R. It will be more convenient to change the 
optimization variable R here by 1 — R (which clearly doesn’t change the value of 
d), so that 


o(a, b,c) = inf {E|R —1|1| R>aas., ER=1+b, E(RX) =}. 
eL4@ 
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Let us show that ¢ is continuous at zero. Denote by C(a,b,c) C L¢ the set 
of R € L? satisfying the constraints of the problem. It will be enough to show 
that if |la||q, |b, |c| are sufficiently small then for any R € C(O, 0, 0) there exists 
Re C(a, b, c) such that || R — Rllq < (WRllg + K)allg + |b] + lel) and vice versa. 
Here K is some fixed constant. 

Since P(X < 0) 4 0, there exists € € L® such that € > Oa.s. and E(éX) = — 
If R € C(O, 0, 0), then one can take the required REC (a, b, c) in the form 


RK at A;R+A2€, if E(ax) = 0, 
a+pmiR+po2, ifE(aXx) <0, 


where the non-negative constants A1,A2, 41, 42 can be easily found from the 
constraint R € C(a, b,c), and it turns out that Ay, 41 = 1+ O(|lallg +1] + lel) 
and A2, “2 = O(|la|lg + |b] + |c|). If R € C(a, b,c), then take 


a M(R-—a+ 2), ifc >E(aXx), 
Mi(R-a+u2), ife < E@X), 


with A;, 4; making R € C(O, 0, 0). 
Thus, the strong duality holds in (6) and we have 


Sp(X) = sup g(u, v, w) (7) 
ueL4 
v,weR 


with the dual objective function g: [. xRxR-R 
g(u,v,w) = inf E(|R|I7 + Ruw+v+wX)—-—u—w) 
ReL4 
= -E (Stl +utwX|Ptut w), 

where the second inequality is obtained by choosing R which minimizes the 
expression under the expectation for every random outcome. 

Observe that for any fixed v, w € R the optimal u* = u*(v, w) in (7) can 
be found explicitly: u* = (v + wX + q)_. Then by straightforward algebraic 


transformation we obtain (3). 
For p = 2, from (3) we get 


2 1 2 
(S2(X))? = max [b — GB(aX + by} —1] 


It is easy to see that it is enough to ines only over b > 0. Maximizing over b 
and introducing the variable c = —£, we obtain representation (4) for p = 2. 
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To obtain representation (4) for p = 1, let’s again consider problem (5). Similarly 
to (7) (the only change will be to use ||R||qg instead of E|. R|7), we can obtain that 


S|(X) = sup g(u,v, w), 
ueLy 
v,weR 


where now we denote 
g(u,v,w) = eine {||Rlloo + E(R(u + v + wX) — vu) — w}. 
(3 CO 


Observe that a necessary condition for g(u, v, w) > —oo is that Elu+u+wX| < 1: 
otherwise take R = cI(u+u+wX <0)-—Iu+u+wX > 0)) and letc > o~. 
Under this condition we have g(u, v, w) = —Eu — w since from Hdlder’s inequality 
|IE(a@ + v+wX)R)| < ||Rlloo and therefore the infimum in g is attained at R = 0 
a.s. Consequently, the dual problem becomes 


Si(X)=— inf {Eu+w|u>Oas., Blu +v+ wX| <1). (8) 


v,weR 


Observe that the value of the infimum is non-positive, and so it is enough to restrict 
the values of w to R_ only. Let’s fix v € R, w € R_ and find the optimal u* = 
u*(v, w). Clearly, whenever v + wX(w) > 0, it’s optimal to take u*(w) = 0. 
Whenever v + wX(w) < 0, we should have u*(@) < |v + wX(@)|, so that u(@) + 
v+wX(w@) < 0 (otherwise, the choice u*(w) = |v + wX(@)| will be better). Thus 
for the optimal u* 


Elu* +v+ wX| =Elv+wX| —Eu"*. 


In particular, for the optimal u* the inequality in the second constraint in (8) should 
be satisfied as the equality, since otherwise it would be possible to find a smaller 
u*, Observe that if E(u + wX)+ > 1, then no u € L® exists which satisfies the 
constraint of the problem. On the other hand, if E(v + wX)+ < | then at least one 
such u exists. Consequently, problem (8) can be rewritten as follows: 


—S\(X)= inf (Elu+wX|4+w—1]EtwX)s <U. 
veR,weR_- 


Clearly, E|v* + w*X| < 0 for the optimal pair (v*, w*), so the constraint should be 
satisfied as the equality (otherwise multiply both v, w by 1/E]u-+wX)+, which will 
decrease the value of the objective function). By a straightforward transformation, 
we get 


1+S)(X)= sup {v|Ev+wX)+= 1} 


veR,weR_ 


and introducing the new variable c = w/v, we obtain representation (3). 
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2.3 Basic Properties 


Theorem 2 For any p € [1, 00), the monotone Shape ratio in L? satisfies the 
following properties. 


(a) (Quasi-concavity) For any c € R, the set {X € L? : Sp(X) = c} is convex. 

(b) (Scaling invariance) Sp(AX) = Sp(X) for any real r > 0. 

(c) (Law invariance) If X and Y have the same distribution, then Sp(X) = Sp(Y). 

(d) (2nd order monotonicity) If X dominates Y in the second stochastic order, then 
Sp(X) = Sp(Y). 

(e) (Continuity) Sp(X) is continuous with respect to LP-norm at any X such that 
EX > OandP(X <0) 40. 


Before proving this theorem, let us briefly discuss the properties in the context of 
the portfolio selection problem. 

The quasi-concavity implies that the monotone Sharpe ratio favors portfolio 
diversification: if Sp(X) = c and S,(Y) = c, then SpAX + Ud — AY) = c for 
any A € [0, 1], where AX + (1 — A)Y can be thought of as diversification between 
portfolios with returns X and Y. Note that the property of quasi-concavity is weaker 
than concavity; it’s not difficult to provide an example showing that the monotone 
Sharpe ratio is not concave. 

The scaling invariance can be interpreted as that the monotone Sharpe ratio 
cannot be changed by leveraging a portfolio (in the same way as the standard Sharpe 
ratio). Namely, suppose X = R,, where R, = (x, R) is the return of portfolio 
x € R"! (as in Sect. 2.1), >); x1 = 1. Consider a leveraged portfolio X with 
X; = Axj,i => 1 and X = 1 — 9° %;j, i.e. a portfolio which is obtained from x by 
proportionally scaling all the risky positions. Then it’s easy to see that Ry = AR,, 
and so Sp(Ry) = Sp(Rz). 

Law invariance, obviously, states that we are able to evaluate the performance 
knowing only the distribution of the return. The interpretation of the continuity 
property is also clear. 

The 2nd order monotonicity means that Sp is consistent with preferences of risk- 
averse investors. Recall that it is said that the distribution of a random variable X 
dominates the distribution of Y in the 2nd stochastic order, which we denote by 
XY, if EU(X) => EU(Y) for any increasing concave function U such that EU (X) 
and EU(Y) exist. Such a function U can be interpreted as a utility function, and 
then the 2nd order stochastic dominance means that X is preferred to Y by any risk 
averse investor. 

Regarding the properties from Theorem 2, let us also mention the paper [3], 
which studies performance measures by an axiomatic approach in a fashion similar 
to the axiomatics of coherent and convex risk measures. The authors define a 
performance measure (also called an acceptability index) as a functional satisfying 
certain properties, then investigate implications of those axioms, and show a deep 
connection with coherent risk measures, as well as provide examples of performance 
measures. The minimal set of four axioms a performance measure should satisfy 
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consists of the quasi-concavity, monotonicity, scaling invariance and semicontinuity 
(in the form of the so-called Fatou property in L®, as the paper [3] considers only 
functionals on L®). In particular, the monotone Sharpe ratio satisfies those axioms 
and thus provides a new example of a performance measure in the sense of this 
system of axioms. It also satisfies all the additional natural properties discussed in 
that paper: the law invariance, the arbitrage consistency (Sp(X) = +00 iff X > 0 
a.s. and P(X > 0) 4 0) and the expectation consistency (ifEX < 0 then S,)(X) = 0, 
and if EX > 0 then S)(X) > 0; this property is satisfied for p > 1). 


Proof (Proof of Theorem 2) Quasi-concavity follows from that the L?—Sharpe ratio 
Sp(X) = ay is quasi-concave. Indeed, if $,(X) = c and Sp(Y) = c, then 


AEX + (1 — A)EY 
SpAxX + 1 —A)Y) = —.—————— 2 € 
Aop(X) + (1 — A)op(Y) 
for any A € [0, 1]. Since S, is the maximum of f7(X) = S)(X — Z) over Z € Lay 
the quasi-concavity is preserved. 

The scaling invariance is obvious. Since the expectation and the L?-deviation 
are law invariant, in order to prove the law invariance of Sp, it is enough to show 
that the supremum in the definition of Sp(X) can be taken over only Y < X which 
are measurable with respect to the o-algebra generated by X, or, in other words, 
Y = f(X) for some measurable function f on R. But this follows from the fact 
that if for any Y < X one considers ¥ = E(Y | X), then Y < X, E(Y) = EY and 
op(Y) < op(Y), hence $,(Y) > Sp(Y). 

To prove the 2nd order monotonicity, recall that another characterization of the 
2nd order stochastic dominance is as follows: X;<X2 if and only if there exist 
random variables X/, and Z (which may be defined on a another probability space) 


such that X2 2 xs, Xy Es X5 + Z and E(Z | X45) < 0. Suppose X;<X2. From 
the law invariance, without loss of generality, we may assume that X1, X2, Z are 
defined on the same probability space. Then for any Y; < X take ¥Yp = E(Y | X2). 
Clearly, Y2 < X2, EY) = EY and op(Y2) < op(¥1). Hence op(X1) < op(X2). 

Finally, the continuity of S,(X) follows from that the expectation and the L?- 
deviation are uniformly continuous. 


3 Buffered Probabilities 


In the paper [21] was introduced the so-called buffered probability, which is defined 
as the inverse function of the conditional value at risk (with respect to the risk level). 
The authors of that and other papers (for example, [7]) argue that in stochastic 
optimization problems related to minimization of probability of adverse events, the 
buffered probability can serve as a better optimality criterion compared to the usual 
probability. 
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In this section we show that the monotone Sharpe ratio is tightly related to the 
buffered probability, especially in the cases p = 1, 2. In particular, this will provide 
a connection of the monotone Share ratio with the conditional value at risk. We begin 
with a review of the conditional value at risk and its generalization to the spaces L”. 
Then we give a definition of the buffered probability, which will generalize the one 
in [12, 21] from L! to arbitrary L?. 


3.1 A Review of the Conditional Value at Risk 


Let X be a random variable, which describes loss. As opposed to the previous 
section, now large values are bad, small values are good (negative values are profits). 
For a moment, to avoid technical difficulties, assume that X has a continuous 
distribution. 

Denote by Q(X, A) the A-th quantile of the distribution of X, 4 € [0, 1], ie. 
Q(X, A) is a number x € R, not necessarily uniquely defined, such that P(X < 
x) = A. The quantile Q(X, A) is also called the value at risk! (VAR) of X at level 
4, and it shows that in the worst case of probability 1 — A, the loss will be at least 
Q(X, 4). This interpretation makes VAR a sort of a measure of risk (in a broad 
meaning of this term), and it is widely used by practitioners. 

However, it is well-known that VAR lacks certain properties that one expects 
from a measure of risk. One of the most important drawbacks is that it doesn’t 
show what happens with probability less than 1 — A. For example, an investment 
strategy which loses $1 million with probability 1% and $2 million with probability 
0.5% is quite different from a strategy which loses $1 million and $10 million with 
the same probabilities, however they will have the same VAR at the 99% level. 
Another drawback of VAR is that it’s not convex—as a consequence, it may not 
favor diversification of risk, which leads to concentration of risk (above 1 — 2 level). 

The conditional value at risk (CVAR; which is also called the average value at 
risk, or the expected shortfall, or the superquantile) is considered as an improvement 
of VAR. Recall that if X € L! and has a continuous distribution, then CVAR of X 
at risk level 2 € [0, 1] can be defined as the conditional expectation in its right tail 
of probability 1 — A, ie. 


CVAR(X, 4) = E(X | X > Q(X, A)) (9) 


We will also use the notation Q(X, 4) = CVAR(X, A) to emphasize the connection 
with quantiles. 

CVAR provides a basic (and the most used) example of a coherent risk measure. 
The theory of risk measures, originally introduced in the seminal paper [1], plays 


'Some authors use definitions of VAR and CVAR which are slightly different from the ones used 
here: for example, take (—X) instead of X, or | — A instead of 4, etc. 
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now a prominent role in applications in finance. We are not going to discuss all the 
benefits of using coherent (and convex) risk measures in optimization problems; a 
modern review of the main results in this theory can be found, for example, in the 
monograph [8]. 

Rockafellar and Uryasev [22] proved that CVAR admits the following represen- 
tation though the optimization problem 


Q(X, 4) = min( zs ce —o4t+ c). (10) 


Actually, this formula can be used as a general definition for CVAR, which works in 
the case of any distribution of X, not necessarily continuous. The importance of this 
representation is that it provides an efficient method to compute CVAR, which in 
practical applications often becomes much faster than e.g. using formula (9). It also 
behaves “nicely” when CVAR is used as a constraint or an optimality criterion in 
convex optimization problems, for example portfolio selection. Details can be found 
in [22]. 

Representation (10) readily suggests how CVAR can be generalized to “put more 
weight” on the right tail of the distribution of X, which provides a coherent risk 
measure for the space L?. 


Definition 3 For X € L”, define the L?-CVAR at level 4 € [0, 1) by 


1 
Qp(X, A) = min( IK = oallp +e). 


The L?-CVAR was studied, for example, in the papers [2, 9]. In particular, in [9], 
it was argued that higher values of p may provide better results than the standard 
CVAR (p = 1) in certain portfolio selection problems. For us, the cases p = 1, 2 
will be the most interesting due the direct connection with the monotone Sharpe 
ratio, as will be shown in the next section. 


It is known that the following dual representation holds for L?-CVAR, which we 
will use below: for any X € L? and) ¢ [0, 1) 


Q,(X, 4) = sup{E(RX) | REL4, |IRilg < —A)7!, ER=I]}, (11) 


where, as usual, + + 7 = |. This result is proved in [2]. 
3.2. The Definition of Buffered Probability 
and Its Representation 


Consider the function inverse to CVAR in i, that is for a random variable X and 
x € R define P(X, x) = i where A is such that Q(X, A) = x (some care should 
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be taken at points of discontinuity, a formal definition is given below). In the papers 
[12, 13, 21], P(X, x) was called the “buffered” probability that X > x; we explain 
the rationale behind this name below. At this moment, it may seem that from a purely 
mathematical point of view such a simple operation as function inversion probably 
shouldn’t deserve much attention. But that’s not the case if we take applications 
into account. For this reason, before we give any definitions, let us provide some 
argumentation why studying P(X, x) may be useful for applications. 

In many practical optimization problems one may want to consider constraints 
defined in terms of probabilities of adverse event, or to use those probabilities 
as optimization criteria. For example, an investment fund manager may want 
to maximize the expected return of her portfolio under the constraint that the 
probability of a loss more than $1 million should be less than 1%; or an engineer 
wants to minimize the construction cost of a structure provided that the tension in 
its core part can exceed a critical threshold with only a very small probability during 
its lifetime. 

Unfortunately, the probability has all the same drawbacks as the value at risk, 
which were mentioned above: it’s not necessarily convex, continuous and doesn’t 
provide information about how wrong things can go if an adverse event indeed 
happens. For those reasons, CVAR may be a better risk measure, which allows 
to avoid some of the problems. For example, if using CVAR, the above investor 
can reformulate her problem as maximization of the expected return given that the 
average loss in the worst 1% of cases doesn’t exceed $1 million. However, such a 
setting of the problem may be inconvenient, as CVAR “speaks” in terms of quantiles, 
but one may need the answer in terms of probabilities. For example, $1 million may 
be value of liquid assets of the fund which can be quickly and easily sold to cover 
a loss; so the manager must ensure that the loss doesn’t exceed this amount. But it 
is not clear how she can use the information about the average loss which CVAR 
provides. A similar problem arises in the example with an engineer. 

In [21], Rockafellar and Royset proposed the idea that the inverse of CVAR may 
be appropriate for such cases: since quantiles and probabilities are mutually inverse, 
and CVAR is a better alternative to quantiles, then one can expect that the inverse 
of CVAR, the buffered probability, could be a better alternative to probability. Here, 
we follow this idea. 

Note that, in theory, it is possible to invert CVAR as a function in A, but, in 
practice, computational difficulty may be a serious problem for doing that: it may 
take too much time to compute CVAR for a complex system even for one fixed level 
of risk A, so inversion, which requires such a computation for several 4, may be 
not feasible (and this is often the case in complex engineering or financial models). 
Therefore, we would like to be able to work directly with buffered probabilities, 
and have an efficient method to compute them. We’ll see that the representation 
given below turns out to give more than just an efficient method of computation. In 
particular, in view of Sect. 2, it will show a connection with the monotone Sharpe 
ratio, a result which is by no means obvious. 

The following simple lemma will be needed to show that it is possible to invert 
CVAR. 
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Lemma 2 For X € L?, p ¢€ [1, 00), the function f(A) = Qp(X, A) defined for 
A € [0, 1) has the following properties: 


1. f() =Ex; 

2. f (A) is continuous and non-decreasing; 

3. f(A) is strictly increasing on the set {X: f (A) < ess sup X}; 

4. if P := P(X =esssup X) > 0, then f (A) = ess sup X ford € [1 — P!/?, 1). 
Proof The first property obviously follows from the dual representation, and the 
second one can be easily obtained from the definition. To prove the third property, 
observe that if Q,(X, 4) < ess sup X, then the minimum in the definition is attained 
at some c* < esssup X. So, for any A’ < i we have Q,(X,d’) < rx - 
c*)4llp + c* < Qp(X, A) using that ||(X — c*)4||p > 0. 

Finally, the fourth property follows from that if P > 0, and, in particular, 
ess sup X < oo, then Q,(X,A) < esssup X for any  ¢€ [0, 1), as one can take 
c = ess sup X in the definition. On the other hand, for 49 = 1 — Pp? we have that 
R = P~'I(X = esssup X) satisfies the constraint in the dual representation and 
E(RX) = ess sup X. Hence Q(X, Ao) = ess sup X, and then Q,(X, 4) = ess sup X 
for any 4 > Ao by the monotonicity. 


Definition 4 For X € L?, p € [1, ©), and x € R, set 


0, if x > esssup X, 

(P(X = sup X))?, if x = esssup X, 
1-Q5'(%, x), if EX <x <esssupX, 
1, ifx < EX. 


Pp(X, x)= 


The “main” case in this definition is the third one. In particular, one can see that 
for a random variable X which has a distribution with a support unbounded from 
above, the first and the second cases do not realize. Figure | schematically shows 
the relation between the quantile function, the CVAR, the probability distribution 
function, and the buffered probability. In particular, it is easy to see that always 
P(X, x) = P(X > x). According to the terminology of [21], the difference between 
these two quantities is a “safety buffer’, hence the name buffered probability. 


Theorem 3 For any X € L? 


P p(X, x) = min ||(c(X — x) + all. (12) 


Proof For the case p = | this result was proved in [12]. Here, we follow the same 
idea, but for general p € [1, 00). Without loss of generality, we can assume x = 0, 
otherwise consider X — x instead of X. 


Monotone Sharpe Ratios and Related Measures of Investment Performance 653 


nx 
esssup X + ------ 


A=1 EX 0 ess sup X 


Fig. 1 Left: quantile and distribution functions. Right: complementary probability distribution 
function and buffered probability P(X,x). In this example, esssup X < oo, but P(X = 
ess sup X) = 0, so P(X, x) is continuous everywhere 


Case 1: EX <0,esssup X > 0. By Lemma 2 and the definition of Q, we have 


P,(X, 0) = min{a € (0, 1) | Q,(X, 1— A) = 0} 


in | min(LI((X —c)=0 
AEH RGM + ele 2 = 9} 


min {A X+c = Ac}. 
ae | INC )+Ilp } 
cEeR 


Xd 


Observe that the minimum here can be computed only over c > 0 (since for 
c < 0 the constraint is obviously not satisfied). Then dividing the both parts of 
the equality in the constraint by c we get 


Pp(X, 0) = min Ic 1X +1) 4llp, 


which is obviously equivalent to (12). 
Case 2: EX > 0. We need to show that = (eX + 1)4||p = 1. This clearly 
Cc 


follows from that for any c > 0 we have min | (cX+])4|lp = ea E(cX+1) = 1. 
c= c= 


Case 3: esssup X = 0. Now ||(cX + 1)4||p > P(X = 0)!/? for any c > 0, while 
II(cX + l)4llp > P(X = 0)!/P as c + +00. Hence min IK(cX + l)4llp = 
ce 


P(X = 0)!/? as claimed. 
Case 4: esssup X < 0. Similarly, ||(cX + 1)+||p > Oasc > +00. 
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From formula (12), one can easily see the connection between the monotone 
Sharpe ratio and the buffered probability for p = 1, 2: for any X € L? 


— | _@_x, oy 
T+ Gpxpe PCO" 

In particular, if X is as the return of a portfolio, then a portfolio selection problem 
where one wants to maximize the monotone Sharpe ratio of the portfolio return 
becomes equivalent to the minimization of the buffered probability that (—X) 
exceeds 0, i.e. the buffered probability of loss. This is a nice (and somewhat unex- 
pected) connection between the classical portfolio theory and modern developments 
in risk evaluation! 

One can ask a question whether a similar relation between P,, and Sp holds for 
other values of p. Unfortunately, in general, there seems to be no simple formula 
connecting them. It can be shown that they can be represented as the following 
optimization problems: 


Sp(X) = min {||R — 1lg |ER = 1, E(RX) = 1), 
ReL4 


P,(X,0) = min {||Rllg | ER = 1, E(RX) = 1}, 
ReL4 


which have the same constraint sets but different objective functions. The first 
formula here easily follows from (5), the second one can be obtained using the 
dual representation of CVAR (11). 


3.3. Properties 


In this section we investigate some basic properties of P,(X, x) both in X and x, 
and discuss its usage in portfolio selection problem. One of the main points of this 
section is that buffered probabilities (of loss) can be used as optimality criteria, 
similarly to monotone Sharpe ratios (and in the cases p # 1,2 they are more 
convenient due to s simpler representation). 


Theorem 4 Suppose X € L?, x € Rand p é¢€ [1, 0). Then Pp(X, x) has the 
following properties. 


1. The function x +> P»(X,x) is continuous and strictly decreasing on 
[EX, ess sup(X)), and non-increasing on the whole R. 

2. The function X +> P(X, x) is quasi-convex, law invariant, 2nd order monotone, 
continuous with respect to the L?-norm, and concave with respect to mixtures of 
distributions. 

3. The function p +> P p(X, x) is non-decreasing in p. 
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For p = 1, similar results can be found in [12]; the proofs are similar as well 
(except property 3, but it obviously follows from the Lyapunov inequality), so we 
do not provide them here. 

Regarding the second property note that despite P(X, x) is quasi-convex in X, 
it’s not convex in X as the following simple example shows: consider X = 2 and 
Y =-1;thenP(X+Y)/2, DN =14 5 — sP(X, 0) + sP(Y, 0). 

Also recall that the mixture of two distributions on R specified by their 
distribution functions Fj(x) and F(x) is defined as the distribution F(x) = 
AF (x) + (1. — A) F(x) for any fixed A € [0, 1]. We write X Z AX, ® (1 —A)X2 
if the distribution of a random variable X is the mixture of the distributions of X41 
and X9. If € is a random variable taking values 1, 2 with probabilities 2, 1 — 4 and 


independent of X,, X2, then clearly X z X;. Concavity of P,(x, x) with respect to 
mixtures of distributions means that P,(X, x) > APp»(X1, x) +  — A)Pp(X2, x). 

Now let’s look in more details on how a simple portfolio selection problem can 
be formulated with P,. Assume the same setting as in Sect.2.1: R is a (n + 1)- 
dimensional vector of asset returns, the first asset is riskless with the rate of return 
r, and the other n assets are risky with random return in L?. Let R, = (x, R) denote 
the return of a portfolio x € R”*+!, and 5 > 0 bea fixed number, a required expected 
return premium. Consider the following optimization problem: 


minimize P,(r— Rx, 0) overx € Rt! 
subjectto E(R, —r) = 6, (13) 
a =. 


In other words, an investor wants to minimize the buffered probability that the return 
of her portfolio will be less than the riskless return subject to the constraint on the 
expected return. Denote the vector of adjusted risky returns R = (Rj —r,..., Rn — 
r), and the risky part of the portfolio x = (x1,..., x,). Using the representation of 
P», the problem becomes 


minimize E(1 — (x, R))e over x € R” 
= (14) 
subjectto E(x, R) > 0. 


If we find a solution x* of this problem, then the optimal portfolio in problem (13) 
can be found as follows: 
éx* n 
x* = ———, j= 1,...,n, xo = 1-5 7. 
E(x*, R) i=l 


Moreover, observe that the constraint E(x, R) > 0 can be removed in (14) since the 
value of the objective function is not less than 1 in the case if E(x, R) < 0, which is 
not optimal. Thus, (14) becomes an unconstrained problem. 
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4 Dynamic Problems 


This section illustrates how the developed theory can be used to give new elegant 
solutions of dynamic portfolio selection problems when an investor can continu- 
ously trade in the market. The results we obtain are not entirely new, but their proofs 
are considerably shorter and simpler than in the literature. 


4.1 A Continuous-Time Market Model and Two Investment 
Problems 


Suppose there are two assets traded in the market: a riskless asset with price B; and 
a risky asset with price S; at time t € [0, 00). The time runs continuously. Without 
loss of generality, we assume B; = 1. The price of the risky asset is modeled by a 
geometric Brownian motion with constant drift 2 and volatility o, i.e. 


o2 
S; = Syex (om, + (u- =1)), t>0, 


where W;, is a Brownian motion (Wiener process). Without loss of generality, So = 
1. It is well-known that the process S; is the unique strong solution of the stochastic 
differential equation (SDE) 


dS; = S: (edt + odW,). 


We consider the following two problems of choosing an optimal investment 
strategy in this market model. 


Problem 1 Suppose a trader can manage her portfolio dynamically on a time 
horizon [0, 7]. A trading strategy is identified with a scalar control process u;, 
which is equal to the amount of money invested in the risky asset at time t. The 
amount of money v; is invested in the riskless asset. The value X = u; + v; of the 
portfolio with the starting value X¢ = xo satisfies the controlled SDE 


dX" =u,(udt+oadW,), = Xt = x0. (15) 


This equation is well-known and it expresses the assumption that the trading strategy 
is self-financing, i.e. it has no external inflows or outflows of capital. Note that v; 
doesn’t appear in the equation since it can be uniquely recovered as v; = Xi — uy. 
To have X” correctly defined, we’ll assume that u; is predictable with respect to 
the filtration generated by W; and E fea urdt < oo. We’ll also need to impose the 
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following mild technical assumption: 


D7 T 2 

o~p Us 
E —— ——~ dt oO. 16 
cx 2 I (i — x? )< oe 


The class of all the processes u; satisfying these assumptions will be denoted by 
%@. Actually, it can be shown that (16) can be removed without changing the class 
of optimal strategies in the problem formulated below, but to keep the presentation 
simple, we will require it to hold. 

The problem consists in minimizing the buffered probability of loss by time T. 
So the goal of the trader is to solve the following control problem with some fixed 
p € (1, 0): 


VY; = inf Pp (xo - XT, 0). (17) 
uc 


For p = 2 this problem is equivalent to the problem of maximization of the 
monotone Sharpe ratio Sp(X7 — xo). Moreover, we'll also show that the same 
solution is obtained in the problem of maximization of the standard Sharpe ratio, 
S(X7 — x0). Note that we don’t consider the case p = 1. 

From (15) and (17), it is clear that without loss of generality we can (and will) 
assume x9 = 0. It is also clear that there is no unique solution of problem (17): 
if some u* minimizes P»(xo — X7,0) then so does any u; = cu* with a constant 
c > 0. Hence, additional constraints have to be imposed if one wants to have a 
unique solution, for example a constraint on the expected return like EX7, = xo+ 6. 
This is similar to the standard Markowitz portfolio selection problem, as discussed 
in Sect. 2.1. 


Problem 2 Suppose at time ¢t = 0 a trader holds one unit of the risky asset with the 
starting price So = | and wants to sell it better than some goal price x > 1. The 
asset is indivisible (e.g. a house) and can be sold only at once. 


A selling strategy is identified with a Markov time of the process S;. Recall that a 
random variable t with values in [0, oo] is called a Markov time if the random event 
{t < t} is in the o-algebra o(S,; r < t) for any t > 0. The notion of a stopping 
time reflects the idea that no information about the prices in the future can be used 
at the moment when the trader decides to sell the asset. The random event {t = oo} 
is interpreted as the situation when the asset is never sold, and we set Soo := 0. 
We’ll see that the optimal strategy in the problem we formulate below will not sell 
the asset with positive probability. 

Let .@ we denote the class of all Markov times of the process S;. We consider 
the following optimal stopping problem for p € (1, 00): 


Vo = inf Py — $,,0), (18) 
TEM 
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i.e. minimization of the buffered probability to sell worse then for the goal price 
x. Similarly to Problem 1, in the case p = 2, it’ll be shown that this problem 
is equivalent to maximization of the monotone Sharpe ratio S2(S; — x), and the 
optimal strategy also maximizes the standard Sharpe ratio. 


4.2 A Brief Literature Review 


Perhaps, the most interesting thing to notice about the two problems is that they 
are “not standard” from the point of view of the stochastic control theory for 
diffusion processes and Brownian motion. Namely, they don’t directly reduce 
to solutions of some PDEs, which for “standard” problems is possible via the 
Hamilton—Jacobi—Bellman equation. In Problems | and 2 (and also in the related 
problems of maximization of the standard Sharpe ratio), the HJB equation cannot be 
written because the objective function is not in the form of the expectation of some 
functional of the controlled process, i.e. not EF(X}'; r < T) or EF(S,; r < Tt). 
Hence, another approach is needed to solve them. 

Dynamic problems of maximization of the standard Sharpe ratio and related 
problems with mean-variance optimality criteria have been studied in the literature 
by several authors. We just briefly mention several of them. 

Richardson [19] was, probably, the first who solved a dynamic portfolio selection 
problem under a mean-variance criterion (the earlier paper of White [26] can also 
be mentioned); he used “martingale” methods. Li and Ng [11] studied a multi- 
asset mean-variance selection problem, which they solved through auxiliary control 
problems in the standard setting. A similar approach was also used in the recent 
papers by Pedersen and Peskir [15, 16]. The first of them provides a solution for the 
optimal selling problem (an analogue of our Problem 2), the second paper solves the 
portfolio selection problem (our Problem 1). There are other results in the literature, 
a comprehensive overview can be found in the above-mentioned papers by Pedersen 
and Peskir, and also in the paper [6]. 

It should be also mentioned that a large number of papers study the so-called 
problem of time inconsistency of the mean-variance and similar optimality criteria, 
which roughly means that at a time ¢ > fo it turns out to be not optimal to follow 
the strategy, which was optimal at time fo. Such a contradiction doesn’t happen in 
standard control problems for Markov processes, where the Bellman principle can 
be applied, but it is quite typical for non-standard problems. Several approaches 
to redefine the notion of an optimal strategy that would take into account time 
inconsistency are known: see, for example, the already mentioned papers [6, 15, 16] 
and the references therein. We will not deal with the issue of time inconsistency (our 
solutions are time inconsistent). 

Compared to the results in the literature, the solutions of Problems | and 2 in the 
case p = 2 readily follows from earlier results (e.g. from [15, 16, 19]); the other 
cases can be also studied by previously known methods. Nevertheless, the value 
of this paper is in the new approach to solve them through the monotone Sharpe 
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ratio and buffered probabilities. This approach seems to be simpler than previous 
ones (the reader can observe how short the solutions presented below compared to 
[15, 16]) and promising for more general settings. 


4.3 Solution of Problem 1 


Theorem 5 The class of optimal control strategies in Problem I is given by 


(oo c 
pa gc, 


Mt G2(p— 1) 


where c > 0 can an arbitrary constant. The process ye =c- Nid is a geometric 
Brownian motion satisfying the SDE 


ore je lL 


— = -— at - ——_AwW,, ye =c, 
Y/}' o-(p—1) o(p—1) 


Proof Assuming xo = 0, from the representation of P,(X, x) we have 


V, = min min ||(1 — cX4 = min ||(1 — x4 ? 19 
1 = min min (1 — eX})4llp = min 1 — X$)4llp (19) 


where in the second equality we used that the constant c can be included in the 
control, since cX“” = X“. Denote X/ = 1 — X/, so that the controlled process X“ 
satisfies the equation 
ax? = —pu,dt — ou,;dW;, x =1. 

Then 

Vi = min E|X7|’, (20) 

uc 

where (-)4 from (19) was removed since it is obvious that as soon as x reaches 


Zero, it is optimal to choose u = 0 afterwards, so the process stays at zero. 
Let v; = v;(u) = —u;/X}. Then for any u € Y we have 


E|X4|? = B| Zr exp( Jy (uprs + 50° (p - pyvy)ds) 


where Z is the stochastic exponent process Z = G&(opv). From Novikov’s 
condition, which holds due to (16), Z; is a martingale and EZ; = 1. By introducing 
the new measure Q on the o-algebra ¥r = o(W;, t < T) with the density 
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dQ = ZrdP we obtain 
B[X4|? = EP fexp( fy (ups + 50° (p> — pyvz)ds)}. 


Clearly, this expression can be minimized by minimizing the integrand for each f, 
i.e. by 


(ua 
Fe 
U, = ~e2(p—1 for allt € [0, T|. 


Obviously, it satisfies condition (16), so the corresponding control process 


* Lh vu Lu u 
eee, ee ee, Pe 
= at ips 


is optimal in problem (20). Consequently, any control process uf = cut, c > 0, will 
be optimal in (17). Since X a = cX"”, we obtain the first claim of the theorem. The 
representation for Y, a follows from straightforward computations. 


Corollary 1 Let u* = aa (e— X''), with some c > 0, be an optimal control strategy 
in problem (17) for p = 2. Then the standard Sharpe ratio of xe is equal to its 
monotone Sharpe ratio, Se) = So (Si). 

In particular, u* also maximizes the standard Sharpe ratio of the return X‘, i.e. 
S(X#) < S(X") for any ue %. 


Proof Suppose there is Y € L* such that Y < xe and S(Y) > S(xe). It is 
well-known that the market model we consider is no-arbitrage and complete. This 
implies that there exists yo < 0 and a control u; such that Xj = yo and X7. = Y. 
The initial capital yo is negative, because otherwise an arbitrage opportunity can be 
constructed. But then the capital process X,= X} — yo would have a higher Sharpe 
ratio than Y and hence a higher monotone Sharpe ratio than X7*. A contradiction. 
This proves the first claim of the corollary, and the second one obviously follows 
from it. 


4.4 Solution of Problem 2 


We'll assume that x > 1, uw € R,o > 0, p > 1 are fixed throughout and use the 
following auxiliary notation: 


b 


—1 
—___—_____— -x) for b € [x, 00). 
1+ 20 —bl-yjrt 


2 
y=, cn) = ( 
Oo 
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Theorem 6 The optimal selling time t* in problem (18) is as follows. 


1. If « < 0, then t* can be any Markov time: P p(x — S;,0) = 1 forany t € . 
2. fu = as then S; reaches any level x' > x with probability 1 and any stopping 


time of the form t* = inf{t > 0: S; = x’} is optimal. 
3. 1f0<p< as then the optimal stopping time is 


t* = inf{t > 0:8; > b*}, 
where b* & [x, 00) is the point of minimum of the function 
f(b) = (A+ Cb) — b))?bY-! + (1+ C@)x)?(1- bY"), bb E [x, 00), 


and we set t* = +00 on the random event {S; < b for all t > O}. 


Observe that if 0 < w < a ie. y € (0, 1), then the function f(b) attains its 
minimum on [x, oo), since it is continuous with the limit values f(x) = f(co) = 1. 


Proof From the representation for P, we have 


Ae _ P 
vi mel ot Eee Sr))+l". 


Let YF = 1+ c(x — S;). Observe that if u < 0, then Y/ is a submartingale for any 
c > 0, and so by Jensen’s inequality |(Y;)+)|? is a submartingale as well. Hence for 
any t € @ we have E|(1 + c(x — S,))+|? = 1, and then V2 = 1. 

If u> a then from the explicit representation S; = exp(o W; + (wu — =i) one 
can see that S; reaches any level x’ > 1 with probability 1 (as the Brownian motion 
W, with non-negative drift does so). Then for any x’ > x we have P p(x — Sz, 0) = 
0, where Tt,’ is the first moment of reaching x’. 


In the case pz € (0, o), for any c > 0, consider the optimal stopping problem 


V2, = inf E\(1+c( — S,))s/?. 
TEM 


This is an optimal stopping problem for a Markov process S;. From the general 
theory (see e.g. [17]) it is well known that the optimal stopping time here is of the 
threshold type: 


te = inf{t > 0: S; = De}, 


where b. € [x,x + 2] is some optimal level, which has to be found. Then the 
distribution of S;+ is binomial: it assumes only two values b, and 0 with probabilities 


Pc and 1 — pe, where pe = b}. ~! as can be easily found from the general formulas 
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for boundary crossing probabilities for a Brownian motion with drift. Consequently, 


v? = inf inf (a + e(x — b))PbY! 4 (1 4 ex)? — BY-1)), 
DEX c< 
=x) 


It is straightforward to find that for any b > x the optimal c*(b) is given by c*(b) = 
C(b), which proves the claim of the theorem. 


Corollary 2 Assume pw € (0, =) and p = 2. Let t* denote the optimal stopping 
time from Theorem 6. Then the standard Sharpe ratio of S;* — x is equal to its 
monotone Sharpe ratio, S(S;* — x) = S2(S;* — x). In particular, t* also maximizes 
the standard Sharpe ratio of S; — x, i.e. S(S; — x) < S(Sr* — x) for anyt € 4. 

Moreover, in this case the optimal threshold b* can be found as the point of 
maximum of the function 


bY’ —x 
CO) =e a 
bt (1—br-1)3 


Proof Suppose Y < S;+—.x.As shown above, it is enough to consider only Y which 
are measurable with respect to the o-algebra generated by the random variable 
S;«. Since S;« has a binomial distribution, then Y should also have a binomial 
distribution, assuming values y; < b* — x and y2 < —x with the same probabilities 
(b*)’—! and 1 — (b*)”~! as S;* assumes the values b* and 0. Using this, it is now 
not difficult to see that S(Y) < S(S;* — x), which proves the first claim. 

The second claim follows from that for any stopping time of the form t, = {t > 
0: S; = b}, b € [x, o0) we have S(S,, — x) = g(b). 


Appendix 


This appendix just reminds some facts from convex optimization and related results 
which were used in the paper. 


Duality in Optimization 


Let & be a topological vector space and f(z) a real-valued function on 2. Consider 
the optimization problem 


minimize f(z) overz€ &. (21) 


A powerful method to analyze such an optimization problem consists in considering 
its dual problem. To formulate it, suppose that f(z) can be represented in the form 
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f( = F(z, 0) for all z € &, where F(z, a): & x & —> Ris some function, and 
& is another topological vector space (a convenient choice of F and / plays an 
important role). 

Let /* denote the topological dual of ./. Define the Lagrangian L: 2 x o&* > 
R and the dual objective function g: «/* > R by 


LZ,u)= eG: a) + (a, u)}, glu) = ps u). 


Then the dual problem is formulated as the optimization problem 
maximize g(u) overu € .&/*. 


If we denote by Vp and Vp the optimal values of the primal and dual problems 
respectively (i.e. the infimum of f(z) and the supremum of g(u) respectively), then 
it is easy too see that Vp > Vp always. 

We are generally interested in the case when the strong duality takes place, i.e. 
Vp = Vp, or, explicitly, 


min f(z) = max g(u). 22 

min f@) ee ae) (22) 

Introduce the optimal value function ¢(a) = we F(z,a). The following 
zed 


theorem provides a sufficient condition for the strong duality (22) (see Theorem 7 
in [20]). 


Theorem 7 Suppose F is convex in (z,a) and ¢(0) = lim mote): Then (22) 
a> 
holds. 

Let us consider a particular case of problem (21) which includes constraints in 
the form of equalities and inequalities. Assume that 2 = L? for some p € [1, co) 
and two functions h;: L? > L”(R”), i = 1, 2 are given (the spaces L? and L” are 
not necessarily defined on the same probability space). Consider the problem 

minimize f(z) over z € L? 
subjectto g(z) <Oas. 
h(z) =Oas. 


This problem can be formulated as a particular case of the above abstract setting by 
defining 


f@, if g(z) < a, andh(z) =a as., 


+oo, otherwise. 


F(Z, a1, a2) = 
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The Lagrangian of this problem is 
L(z, ur, u2) = inf {F(z, a1, a2) + (ar, ui) + (a2, u2)} 


f(z) + (g(z), u1) + (A(z), 2), ifuy > Oas., 


—0Oo, otherwise, 


where we denote (a, u) = E()°; ajuj). 
So the dual objective function 


g(u,v) = ink { f(z) + (g(z), u) + (A(z), v)} foru > Oas., 
zeLP 
and the dual optimization problem 


. * gif ti 
maximize g(u,v) overue L’, ve LY” 


subject to u > 0. 
The strong duality equality: 


min{ f(z) | g(@) $0, A(z) = 0} = max{g(u, v) | u > 0} 


The Minimax Theorem 


Theorem 8 (Sion’s Minimax Theorem, Corollary 3.3 in [25]) Suppose X, Y are 
convex spaces such that one of them is compact, and f (x, y) is a functionon X x Y, 
such that x +> f (x, y) is quasi-concave and u.s.c. for each fixed yandyt> f(x, y) 
is quasi-convex and l.s.c. for each fixed x. Then 


sup inf f(x, y) = inf sup f(x, y). 
xex yeY yeY yex 
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On Chernoff’s Test for a Fractional ® 
Brownian Motion mai 


Alexey Muravlev and Mikhail Zhithukhin 


Abstract We construct a sequential test for the sign of the drift of a fractional 
Brownian motion. We work in the Bayesian setting and assume the drift has a 
prior normal distribution. The problem reduces to an optimal stopping problem for a 
standard Brownian motion, obtained by a transformation of the observable process. 
The solution is described as the first exit time from some set, whose boundaries 
satisfy certain integral equations, which are solved numerically. 


1 Introduction 


This paper provides an overview of the resutls obtained in the paper [22]. 

Suppose one observes a fractional Brownian motion process (fBm) with linear 
drift and unknown drift coefficient. We are interested in sequentially testing the 
hypotheses that the drift coefficient is positive or negative. We consider a Bayesian 
setting where the drift coefficient has a prior normal distribution, and we use an 
optimality criteria of a test which consists of a linear penalty for the duration of 
observation and a penalty for a wrong decision proportional to the true value of the 
drift coefficient. The main result of this paper describes the structure of the exact 
optimal test in this problem, i.e. specifies a time to stop observation and a rule to 
choose between the two hypotheses. 

The main novelty of our work compared to the large body of literature related 
to sequential tests (for an overview of the field, see e.g. [10, 20]) is that we work 
with fBm. To the best of our knowledge, this is the first non-asymptotic solution 
of a continuous-time sequential testing problem for this process. It is well-known 
that a {Bm is not a Markov process, neither a semimartingale except the particular 
case when it is a standard Brownian motion (standard Bm). As a consequence, many 
standard tools of stochastic calculus and stochastic control (It6’s formula, the HJB 
equation, etc.) cannot be directly applied in models based on fBm. Fortunately, in 
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the problem we consider it turns out to be possible to change the original problem for 
fBm so that it becomes tractable. One of the key steps is a general transformation 
outlined in the note [13], which allows to reduce sequential testing problems for 
fBm to problems for diffusion processes. 

In the literature, the result which is most closely related to ours is the sequential 
test proposed by Chernoff [3], which has exactly the same setting and uses the same 
optimality criterion, but considers only standard Bm. For a prior normal distribution 
of the drift coefficient, Chernoff and Breakwell [1, 4] found asymptotically optimal 
sequential tests when the variance of the drift goes to zero or infinity. 

Let us mention two other recent results in the sequential analysis of fBm, 
related to estimation of its drift coefficient. Cetin et al. [2] considered a sequential 
estimation problem assuming a normal prior distribution of the drift with a 
quadratic or a 6-function penalty for a wrong estimate and a linear penalty for 
observation time. They proved that in their setting the optimal stopping time is non- 
random. Gapeev and Stoev [8] studied sequential testing and changepoint detection 
problems for Gaussian processes, including fBm. They showed how those problems 
can be reduced to optimal stopping problems and found asymptotics of optimal 
stopping boundaries. There are many more results related to fixed-sample (i.e. 
non-sequential) statistical analysis of fBm. See, for example, Part II of the recent 
monograph [19], which discusses statistical methods for fBm in details. 

The remaining part of our paper is organized as follows. Section 2 formulates the 
problem. Section 3 describes a transformation of the original problem to an optimal 
stopping problem for a standard Bm and introduces auxiliary processes which are 
needed to construct the optimal sequential test. The main result of the paper—the 
theorem which describes the structure of the optimal sequential test—is presented 
in Sect. 4, together with a numerical solution. 


2 Decision Rules and Their Optimality 


Recall that the fBm BP , t > O, with Hurst parameter H € (0, 1) is a zero-mean 
Gaussian process with the covariance function 


1 
cov(B#, BH) = ac 2g pas t,s>0. 


In the particular case H = 1/2 this process is a standard Brownian motion (standard 
Bm) and has independent increments; its increments are positively correlated in the 
case H > 1/2 and negatively correlated in the case H < 1/2. 

Suppose one observes the stochastic process 


Z,=0t+ BF, i >0, 


where H e€ (0, 1) is known, and @ is a random variable independent of B” and 
having a normal distribution with known mean jz € R and known variance o” > 0. 
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It is assumed that neither the value of 9, nor the value of BH can be observed 
directly, but the observer wants to determine whether the value of @ is positive or 
negative based on the information conveyed by the combined process Z;. We will 
look for a sequential test for the hypothesis 6 > O versus the alternative 0 < 0. 
By a sequential test we call a pair 5 = (t, d), which consists of a stopping time Tt 
of the filtration ve , generated by Z, and an we -measurable function d assuming 
values +1. The stopping time is the moment of time when observation is terminated 
and a decision about the hypotheses is made; the value of d shows which of them is 
accepted. 

We will use the criterion of optimality of a decision rule consisting in minimizing 
the linear penalty for observation time and the penalty for a wrong decision 
proportional to the absolute value of 6. Namely, with each decision rule 5 we 
associate the risk 


R(8) = E(t + |O|l(d # sgn(6))). 


The problem consists in finding 5* that minimizes R(8) over all decision rules. 

This problem was proposed by Chernoff in [3] for standard Bm, and we refer the 
reader to that paper for a rationale for this setting. The subsequent papers [1, 4, 5] 
include results about the asymptotics of the optimal test and other its properties, 
including a comparison with Wald’s sequential probability ratio test. Our paper [21] 
contains a result which allows to find the exact (non-asymptotic) optimal test by a 
relatively simple numerical procedure. 


3 Reduction to an Optimal Stopping Problem 


From the relation |O|I(d ¢ sgen(@)) = 6*I(d = —1) + 0-I(d = 1), where 07 = 
max(9,0), 9° = —min(@,0), and from that d is F# -measurable, one can see 
that the optimal decision rule should be looked for among rules (t,d) with d = 
min(E(@~ | #2),E@* | #7)). Hence, it will be enough to solve the optimal 
stopping problem which consists in finding a stopping time t* such that R(t*) = 
inf, R(t), where 


R(t) = E(t + min(E(6~ | F7), E@* | FZ))) 
(for brevity, we’ll use the same notation R for the functional associated with a 
decision rule, and the functional associated with a stopping time). 


We will transform the expression inside the expectation in R(t) to the value of 
some process, constructed from a standard Bm. Introduce the process X;, t > 0, by 


t 
x= Cn f Ku(t, s)dZs 
0 


670 A. Muravlev and M. Zhitlukhin 


with the integration kernel Ky (t, s) = (¢ —5)2-4y FP, (5 —H, 5— H, 3 —H, s), 
where 9 F is the Gauss (ordinary) hypergeometric function, and the constant Cy = 


(ao) * as usual, I” denotes the gamma function. 
2HT(4+H)(P(3—H)): 


As follows from [9] (see also earlier results [12, 14]), By = Cy li Kut, s)d Bi 
is a standard Bm, and by straightforward computation we obtain the representation 


dX; = By + OL yt? “dt, 


and the filtrations of the processes Z, and X; coincide. The constant Ly in the above 
formula is defined by Ly = (2H (3 — H)B(S +H, 2- 2H))-2, where B is the 
beta function. 

Now one can find that the conditional distribution Law(@ | A os ) is normal and 
transform the expression for the risk to 


1 
R(t) = E(t _ 5¥r) + const, 


where const denotes some constant (depending on yz, 0, H), the value of which is 
not essential for what follows, and Y,, t > 0, is the process satisfying the SDE 


d¥, = 12-4 (Ly)? +1? 74 /(2—2H))dB,, Yo = Lu, 
where B is another standard Bm, the innovation Brownian motion (see e.g. 


Chapter 7.4 in [11]). For brevity, denote y = (2—2H ‘- Then under the following 
monotone change of time 


rae (2—2H)r ¥ 0.1 
al (orc) Re 


where f runs through the half-interval [0, 00) when r runs through [0, 1), the process 
W, = (OL) Yi) — wo! 


is a standard Bm in r € [0, 1), and the filtrations FM and FR coincide. Denote 


Then the optimal stopping problem for X in t-time is equivalent to the following 
optimal stopping problem for W in r-time: 


Y 
a int E( Mo n(2-) -|w, +4). (1) 
p<l l-p oO 
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Namely, if p* is the optimal stopping time for V, then the optimal decision rule 
5* = (t*, d*) is given by 


t* =1(p"), d* = I(az* > 0) — Var* < 0). (2) 


4 The Main Results 


In this section we formulate a theorem about the solution of problem (1), which 
gives an optimal sequential test through transformation (2). Throughout we will 
assume that o and 4 are fixed and will denote the function 


t Ne 
f@ = Mon (5) . 


It is well-known that under general conditions the solution of an optimal stopping 
problem for a Markov process can be represented as the first time when the process 
enters a certain set (a stopping set). Namely, let us first rewrite our problem in the 
Markov setting by allowing the process W, to start from any point (t, x) € [0, 1)xR: 


ViG,x)= ee + p)—|Wp + xl) — Ff). (3) 


For example, for the quantity V from (1) we have V = V(0, f). We subtract f(t) 
in the definition of V(t, x) to make the function V(t, x) bounded. For t = | we 
define V(1, x) = —|x|. 

The following theorem describes the structure of the optimal stopping time in 
problem (3). In its statement, we set 


ee ee 
to = to( i max( , aS! 


Obviously, fo > 0 for H < 5 and t9 = 0 for H > 5: 


Theorem 1 


1) There exists a function A(t) defined on (to, 1], which is continuous, strictly 
decreasing, and strictly positive for t < 1 with A(1) = 0, such that for any 
t > to and x € R the optimal stopping time in the problem (3) is given by 


p*(t, x) =inf{s >0:|W,+x| > At+s)}. 
Moreover, for any t € (to, 1] the function A(t) satisfies the inequality 


A(t) < a 4 
= Mo Ht?) (4) 
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2) The function A(t) is the unique function which is continuous, non-negative, 
satisfies (4), and solves the integral equation 


1 
G(t, A(t) =| F(t, AW), s, A(s))ds, t € (to, 1), (5) 
t 


with the functions G(t,x) = El¢V/l—t+x| — x and F(t,x,s,y) = 
f' (s)P(o./s — t +x| < y) for a standard normal random variable €. 


Regarding the statement of this theorem, first note that for H < * the stopping 
boundary A(t) is described only for tf > t) > 0. The main reason is that the 
method of proof we use to show that A(f) is continuous and satisfies the integral 
equation requires it to be of bounded variation (at least, locally). In particular, this is 
a sufficient condition for applicability of the It6 formula with local time on curves 
[16], which is used in the proof. In the case H > 5 and for t > to inthe case H < 5 
by a direct probabilistic argument we can prove that A(t) is monotone and therefore 
has bounded variation; this argument however doesn’t work for t < fo in the case 
H < 5 and, as a formal numerical solution shows, the boundary A(t) seems to be 
indeed not monotone in that case. Of course, the assumption of bounded variation 
can be relaxed while the It6 formula can still be applicable (see e.g. [6, 7, 16]), 
however verification of weaker sufficient conditions is problematic. Although the 
general scheme to obtain integral equations of type (5) and prove uniqueness of their 
solutions has been discovered quite a while ago (the first full result was obtained 
by Peskir for the optimal stopping problem for American options, see [17]), and 
has been used many times in the literature for various optimal stopping problems 
(a large number of examples can be found in [18]), we are unaware of any of its 
applications in the case when stopping boundaries are not monotone and/or cannot 
be transformed to monotone ones by space and/or time change. Nevertheless, a 
formal numerical solution of the integral equation produces stopping boundaries 
which “look smooth”, but we admit that a rigor proof of this fact in the above- 
mentioned cases remains an open question. 

Note also, that in the case H > 5 the space-time transformation we apply to pass 
from the optimal stopping problem for the process a(t) to the problem for W, is 
essential from this point of view, because the boundaries in the problem for a(t) are 
not monotone. Moreover, they are not monotone even in the case H = 7 when a(t) 
is obtained by simply shifting X; in time and space, see [3, 21]. 

The second remark we would like to make is that in the case H > 5 we do not 
know whether A (0) is finite. In the case H = 5 the finiteness of A(O) follows from 
inequality (4), which is proved by a direct argument based on comparison with a 
simpler optimal stopping problem (one can see that it extends to t = 0). It seems 
that a deeper analysis may be required for the case H > 7 which is beyond this 


paper. 
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0.5 1.0 


Fig. 1 The stopping boundary A(t) for different values of H ando = | 


Figure | shows the stopping boundary A(t) for different values H computed 


by solving Eq. (5) numerically. A description of the numerical method, based on 
“backward induction’, can be found, for example, in [15]. 
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Abstract These lecture notes are based on Yang’s talk at the MATRIX program 
Geometric R-Matrices: from Geometry to Probability, at the University of Mel- 
bourne, Dec. 18-22, 2017, and Zhao’s talk at Perimeter Institute for Theoretical 
Physics in January 2018. We give an introductory survey of the results in Yang and 
Zhao (Quiver varieties and elliptic quantum groups, 2017. arxiv1708.01418). We 
discuss a sheafified elliptic quantum group associated to any symmetric Kac-Moody 
Lie algebra. The sheafification is obtained by applying the equivariant elliptic 
cohomological theory to the moduli space of representations of a preprojective 
algebra. By construction, the elliptic quantum group naturally acts on the equivariant 
elliptic cohomology of Nakajima quiver varieties. As an application, we obtain 
a relation between the sheafified elliptic quantum group and the global affine 
Grassmannian over an elliptic curve. 


1 Motivation 


The parallelism of the following three different kinds of mathematical objects was 
observed in [11] by Ginzburg-Kapranov- Vasserot. 
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(11): Quantum Groups: 
Yn(g), Ug (Lg), En,x (9) 


(i): 1-dimensional 
algebraic groups C, C*, E 


(iii): Oriented Cohomology 
Theory: CH, K,€il 


Here, the correspondence (i)<>(iii) is well known in algebraic topology, goes back 
to the work of Quillen, Hirzebruch, et al. Similar correspondence also exists in 
algebraic geometry thanks to the oriented cohomology theories (OCT) of Levine 
and Morel [15]. The algebraic OCT associated to C, C* and E are, respectively, the 
intersection theory (Chow groups) CH, the algebraic K-theory K and the algebraic 
elliptic cohomology Ell. 

The correspondence (i)<>(ii) was introduced to the mathematical community 
by Drinfeld [6]. Roughly speaking, the quantum group in (ii) quantizes of the Lie 
algebra of maps from the algebraic group in (i) to a Lie algebra g. The quantization 
is obtained from the solutions to the quantum Yang-Baxter equation 


Ri2(u) R13(u + v) Ro3(v) = Ro3(v) R13 (u + v) Ry2(u). (QYBE) 


The Yangian Yp(g), quantum loop algebra U,(Lg), and elliptic quantum group 
En,x(g) are respectively obtained from the rational, trigonometric and elliptic 
solutions of the QYBE. There is a dynamical elliptic quantum group associated to 
a general symmetrizable Kac-Moody Lie algebra, which is obtained from solutions 
to the dynamical Yang-Baxter equation. 

The correspondence (ii)<>(iii) is the main subject of this note, and is originated 
from the work of Nakajima [21], Varagnolo [24], Maulik-Okounkov [17], and 
many others. Without going to the details, the quantum group in (ii) acts on 
the corresponding oriented cohomology of the Nakajima quiver varieties recalled 
below. 

Let Q = (J, H) be a quiver, with J being the set of vertices, and H being the 
set of arrows. Let Q° be the framed quiver, schematically illustrated below. For any 
dimension vector (v, w) of QO”, we have the Nakajima quiver variety SJt(v, w). 


OC “ieee 3 
| | | 


6 ee © 


Quiver Q 


Framed quiver O° 
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Denote Nt(w) = Lhyenr IM(v, w). We follow the terminologies and notations 
in [21] on quiver varieties, hence refer the readers to Joc. cit. for the details. 
Nevertheless, we illustrate by the following examples, and fix some conventions 
in Sect. 2. 


Example 1 


1. Let Q be the quiver with one vertex, no arrows. Let (v, w) = (r,7), with O < 
r <n. Then, Nr, n) = T*Gr(, n), the cotangent bundle of the Grassmannian. 

2. Let Q be the Jordan quiver with one vertex, and one self-loop. Let (v,w) = 
(n, 1), with n € N. Then, tn, 1) = Hilb”(C”), the Hilbert scheme of n-points 
on C?. 


r:O n:Q 
n: hi-s 
Example (1) Example (2) 


When @Q has no edge-loops, Nakajima in [21] constructed an action of U, (Lg) 
on the equivariant K-theory of 20t(w), with g being the symmetric Kac-Moody 
Lie algebra associated to Q. Varagnolo in [24] constructed an action of Y(g) 
on the equivariant cohomology of the Nakajima quiver varieties. For a general 
quiver Q, Maulik-Okounkov in [17] constructed a bigger Yangian Yyyo acting on 
the equivariant cohomology of SJt(w) using a geometric R-matrix formalism. See 
[1, 22] for the geometric R-matrix construction in the trigonometric (K-theoretic 
stable envelope), and elliptic case (elliptic stable envelope). 

The goal of the present notes is to explain a direct construction of the cor- 
respondence from (iii) to (ii) above, using cohomological Hall algebra (CoHA) 
following [28]. Most of the constructions are direct generalizations of Schiffmann 
and Vasserot [23]. Closely related is the CoHA of Kontsevich-Soibelman [14], 
defined to be the critical cohomology (cohomology valued in a vanishing cycle) 
of the moduli space of representations of a quiver with potential. A relation between 
the CoHA in the present note and the CoHA from [14] is explained in [25]. 

This approach of studying quantum groups has the following advantages. 


¢ The construction works for any oriented cohomology theory, beyond CH, K, Ell. 
One interesting example is the Morava K-theory. The new quantum groups 
obtained via this construction are expected to be related the Lusztig’s character 
formulas [28, §6]. 

¢ In the case of Ell, the construction gives a sheafified elliptic quantum group, as 
well as an action of F',,;(g) on the equivariant elliptic cohomology of Nakajima 
quiver varieties, as will be explained in Sect. 3. 
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2 Construction of the Cohomological Hall Algebras 


For illustration purpose, in this section we take the OCT to be the intersection theory 
CH. Most statements have counterparts in an arbitrary oriented cohomology theory. 


2.1 The Theorem 


Let OQ = (J, H) be an arbitrary quiver. Denote by gg the corresponding symmetric 
Kac-Moody Lie algebra associated to Q. The preprojective algebra, denoted by 7g, 
is the quotient of the path algebra C(Q U Q??) by the ideal generated by the relation 
do yeH LX, x*] = 0, where x* € H°? is the reversed arrow of x € H. Fix a dimension 
vector v € N/, let Rep(Q, v) be the affine space parametrizing representations of 
the path algebra CQ of dimension v, and let Rep(/7g, v) be the affine algebraic 
variety parametrizing representations of [Tg of dimension vy, with an action of 
GLy := Tier GL,i. Here v = (v')jer. Let Rep(UT9, Vv) := Rep(/T7g, v)/GLy be 
the quotient stack. 


Example 2 Let Q be the Jordan quiver: © . The preprojective algebra [Tg = 
C[x, x*] is the free polynomial ring in two variables. We have 


Rep(ITg9,n) = {(A, B) € (gl,)” | [A, B] = 0}/GLn. 


Consider the graded vector space 


P(CH, Q) := QB CHec: (RepUTo, v)) = QB CHcL,xc«(RepUTo, v)). (1) 


veN! veN! 


The torus C* acts on Rep(/7g, v) the same way as in [21, (2.7.1) and (2.7.2)]. More 
explicitly, let a be the number of arrows in Q from vertex i to 7; We enumerate these 
arrows as h1,..., 4g. The corresponding reversed arrows in Q°? are enumerated as 
h},--+ , hj. We define an action of C* on Rep(/7g, v) in the following way. For 
t € C* and (Bp, Be) € Rep(U//g, v) with hp € H, we define 


. .— po t2—2p | B* -— 4744+2p p* 
t-Bp:=t Bo tr Als B* 


Theorem A ([27, 28, Yang-Zhao]) 


1. The vector space P(CH) is naturally endowed with a product * and a coproduct 
A, making it a bialgebra. 

2. On D(P) := P®@P,, there is a bialgebra structure obtained as a Drinfeld double 
of P(CH). For any w € N!, the algebra D(P) acts on CHeLy (2t(w)). 
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3. Assume Q has no edge loops. There is a certain spherical subalgebra D(PPh) 
D(P(CH)), such that 


D(P®") = Ya (go). 


Furthermore, the induced Yangian action from (2) is compatible with the action 
constructed by Varagnolo in [24], and, in the case when Q is of ADE type, the 
action of Maulik-Okounkov in [17]. 


Remark I The construction of the Drinfeld double involves adding a Cartan 
subalgebra and defining a bialgebra pairing, as can be found in detail in [27, $3]. The 
Cartan subalgebra can alternatively be replaced by a symmetric monoidal structure 
as in [26, §5]. The definition of the spherical subalgebra can be found in [27, §3.2]. 


2.2. Constructions 


The Hall multiplication * of P(CH) is defined using the following correspondence. 


RepUTo, v1) x RepUTo, v2) RepUTg, v1 + V2) 


where €xt is the moduli space of extensions {0 ~ Vj; ~ V— WwW > 0 | 
dim(V;) = vj,i = 1,2}. Themap@: (0 ~- Vj > V > Wn > 0) & (Vj, V2) 
is smooth, andy : (0 ~ Vi ~ V > VW > 0) b V is proper. The Hall 
multiplication « is defined to be 


x= Wx, 0g". 


Here the stacks Rep(/7g9, v) for v € N ' are endowed with obstruction theories 
obtained from the embeddings Rep(79, v) — T*Rep(Q, v)/GLy, and Ext has 
a similar obstruction theory described in detail in [28, §4.1]. Similar for the 
construction of the action below. 

Now we explain the action in Theorem A (2). Let Repk (TQ, V, w) be the moduli 
space of framed representations of Tg with dimension vector (v, w), which is con- 
structed as a quotient stack Rep" (7 o, V, w)/GLy. Imposing a suitable semistability 
condition, explained in detail in [21], we get an open subset Rep (/7. Q.V,Ww) Cc 
Rep U7 Q, V, W). There is an isomorphism 


CH(Rep™' (To, v, w)) = CHa, (Cy, w)). 
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We have the following general correspondence [28, $4.1]: 
ext® 
aa 
Rep" (79, v1, w1) x Rep™ (7g, v2, w2) Rep" (79, v1 + v2, Wi + wa) (2) 


TSS 


The action in Theorem A (2) is defined as wr "yy o(¢ )* by taking wy = 0, and 
imposing a suitable semistability condition on the correspondence (2). 


2.3 Shuffle Algebra 


Notations as before, let Q = (J, H) be a quiver. Following the proof of [28, 
Proposition 3.4], we have the following shuffle description of (Ww) © (®)* in (2). 
Let SH be an N! x N! -graded C[f,, t2]-algebra. As a C[t,, t2]-module, we have 


SH= @ _ SHyw, 


veN! ,weN! 
where 
Gy j Sw 
SHy,w = Cit, 21 @ Cs liey, s=1,..., yi ® Cle at Pe wi? 


Here Gy = [];-,; G,i is the product of symmetric groups, and Gy naturally acts 
on the variables (A iers=l »i by permutation. For any (vy, wy) and (v2, w2) € 
N! x N!, we consider SHy,,w; ®@c[1,,n] SHv2,w2 a8 a subalgebra of 


Cit, (a4, 2/] 


We denne the shuffle product SHy, wy ®c[1,.] SHv2,.w. — SHy,+v2,wy+wo> 


fv, Zwy) ® By, Zw) > 


> o( FOL ee ase facy 49a.m;+w2); (3) 


o €Sh(v1,V2) x Sh(wy,w2) 


with facy,+v.,w;+w, Specified as follows. 
Let 


vi 


fin 
fac = TT (4) 
S 


ie! s=1t=1 
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Let m : H||H°? — Z be a function, which for each h € H provides two 
integers mj, and mp+. We define the torus T = (C*)? action on Rep(Q U Q°P) 
according to the function m, satisfying some technical conditions spelled out in [27, 
Assumption 1.1]. The 7-equivariant variables are denoted by 11, t2. Define 


you) yin) (h) vgn) 


in 
cal 


es Il ( ll ll Ge = to + mat) I I ao = eo) ae my-t2)) 


heH s=l t=l s=l1 t=1 
vi wi vs 


iw 
THT" =! +0 [] [a - <i +) (5) 


iel s=1t=1 s=lt=1 


Let 


facy,+v.,wi-+tw2 ?= fac, - facz. (6) 
Proposition 1 Under the identification 


CHoL, xGLyx7 (Rep" Ug, v, w)) 
= Cin, m] @ Cryo BCR 25 .¢ = ews 


iel,s=1,...,v! 
the map (W)x © (@)* is equal to the multiplication (3) of the shuffle algebra SH = 
yen! wen! SHy,w. 


Proof The proof follows from the same proof as [28, Proposition 3.4] replacing the 
quiver Q by the framed quiver QY. 


Remark 2 


1. For an arbitrary cohomology theory, Proposition | is still true when A + B is 
replaced by A +  B in the formula (6) of facy,+y,,w,+w2, Where F is the formal 
group law associated to this cohomology theory. 

2. Restricting to the open subset Mep'*’ (7 Q,V) of Rep UT g,V) induces a 
surjective map SHyw — CHgL,(2t(v,w)), for v é€ N’,w e¢ WN’. The 
surjectivity follows from [18]. This map is compatible with the shuffle product of 
the left hand side, and the multiplication on the right hand side induced from (2). 


2.4 Drinfeld Currents 


Let v = e; be the dimension vector valued 1 at vertex i € J and zero otherwise. 
Then, Po; = CHGL,, xc* (RepUTo, e;)) = C[A][x;]. Let (x;)* € Pe; k € N, be the 
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natural monomial basis of P,;. One can form the Drinfeld currents 


Ri wss ) Gyre, rer. 


keN 


By Theorem A, the generating series {xF (w) | i € TI} satisfy the relations of 
Yj (g)Lu]] (28, § 7]. 


3 Sheafified Elliptic Quantum Groups 


In this section, we applying the equivariant elliptic cohomology to the construction 
in Sect. 2. It gives a sheafified elliptic quantum group, as well as its action on t(w). 


3.1 Equivariant Elliptic Cohomology 


There is a sheaf-theoretic definition of the equivariant elliptic cohomology theory 
Ellg in [11] by Ginzburg-Kapranov-Vasserot. It was investigated by many others 
later on, including Ando [2, 3], Chen [5], Gepner[10], Goerss-Hopkins [12], 
Lurie [16]. 

Let YG be the moduli scheme of semisimple semistable degree 0 G-bundles 
over an elliptic curve. For a G-variety X, the G-equivariant elliptic cohomology 
Ellg(X) of X is a quasi-coherent sheaf of Ox,-module, satisfying certain axioms. 
In particular, Ellg (pt) = Ox,. 


Example 3 


1. Let G = S!, then Ell; (X) is a coherent sheaf on Pic(E) = E. This fact leads to 
the following patten. 


¢ CH: (X) is a module over CHg: (pt) = Oc. 
¢ Kgi(X) is a module over Ki (pt) = Oc+. 
¢ Ellsi(X) is a module over Ells: (pt) = Og. 


2. Let G = GLy, then El/G,, (X) is a coherent sheaf over E® = E"/Gp. 


There is a subtlety for pushforward in the equivariant elliptic cohomology theory. 
Let f : X — Y bea proper, G-equivariant homomorphism. The pushforward f,. is 
the following map 


fx : Ellg(Th(f)) > Ellg(Y), 
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where Ellg(Th(f)), depending on the Thom bundle of the relative tangent bundle 
of f, is a rank 1 locally free module over Ellg(X). The appearance of this twist 
can be illustrated in the following simple example. The general case is discussed in 
detail in [11, § 2] and [29, § 2]. 


Example 4 Let f : {0} — C be the inclusion. The torus S! acts on C by scalar 
multiplication. Denote by D the disc around 0. We have the Thom space Th := 
D/S'. There is an exact sequence 


0 — Elis: (Th) > Ells1(D) > Ellg:(S') > 0. 


As Ellsi(D) = Og, since D is contractible, and Ells: (S!) is the skyscraper sheaf 
Co at 0, we have the isomorphism €//51 (Th) = O(—{0}). 


3.2 The Sheafified Elliptic Quantum Group 
Recall the elliptic cohomological Hall algebra (see (1)) is defined as 


P(Ell, Q) = @ Elli, xc« (Repo, v)). 


veN/ 
By the discussion in Sect. 3.1, P(Ell, Q) is a sheaf on 
Hex X En i= al (E@ 5 EO) x... E@) x Ep, 
{v=(v'!)jce7EN!} 


where fi comes from the C*-action, and Hgy,; is the moduli space of J-colored 
points on E. 


Due to the subtlety of pushing-forward in the elliptic cohomology, there is no 
product on P(Ell, Q) in the usual sense. We illustrate the structure of the Hall 
multiplication « of P(Ell, Q) in the following example. 
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Example 5 Let Q be the quiver © with one vertex, no arrows. In this case, we have 
Go = Sk. The elliptic CoHA P(ElI, slz) associated to the Lie algebra sz consists 
of: 


¢ A coherent sheaf P(El/, sl2) = (Pn)nen On HE X En = [nen E™ x Ep. 
¢ For any n,m &€ N, a morphism of sheaves on ECt™) x Ep: 


x: (Dnm)a( (Pr xX Pin) ® Lan) > Phim, 


where 2’, m the symmetrization map E My EM -» FCM and Ln.m is some 
fixed line bundle on E™ x E) x En, depending on the parameter h. 
¢ The above morphisms are associative in the obvious sense. 


Let C be the category of coherent sheaves on Hex; < Ey. Motivated by the Hall 
multiplication « of P(Ell, Q), we define a tensor structure ®, on C: for {F}, {G} € 
Obj(C), F @n G is defined as 


(FErDvi= ED (Zy.0)«((Fr, BG) @ Ly,,v9)- (7) 


Vj +v2=Vv 


Theorem B ([26, Yang-Zhao]) 


1. The category (C, ®j) is a symmetric monoidal category, with the braiding given 
by Yang and Zhao [26, Theorem 3.3]. 

2. The elliptic CoHA (P(Ell, Q), x, A), endowed with the Hall multiplication x, and 
coproduct A, is a bialgebra object in (C'°°, ®p). 

3. The Drinfeld double D(P(Ell, Q)) of P(Ell, Q) acts on Ellg, (ONCw)), for any 
weN’. 

4. After taking a certain space of meromorphic sections Iya, the bialgebra 
Frat (D(PP* (El ,Q))) becomes the elliptic quantum group given by the 
dynamical elliptic R-matrix of Felder [7], Gautam-Toledano Laredo [9]. 


Remark 3 


1. In Theorem B(4), a version of the sheafified elliptic quantum group with 
dynamical twist DPS" Ell , Q)) is needed in order to recover the R-matrix of 
Felder-Gautam-Toledano Laredo. This twist is explained in detail in [26, § 10.2]. 
Below we illustrate the flavour of this dynamical twist in Sect. 3.4. 


In particular, the abelian category of representations of D(P(EIll, Q)) and 
D(PP"Ell, Q)) are both well-defined (see [26, § 9] for the details). Fur- 
thermore, it is proved in loc. cit. that these two representation categories are 
equivalent as abelian categories. 

2. The details of the space of meromorphic sections are explained in [26, § 6]. 
3. Based on the above theorem, we define the sheafified elliptic quantum group to 
be the Drinfeld double of P’?*(Ell, Q). 
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3.3 The Shuffle Formulas 


The shuffle formula of the elliptic quantum group is given by (3), with the factor 
A + B replaced by 3(A + B). The shuffle formula gives an explicit description of 
the elliptic quantum group, as well as its action on the Nakajima quiver varieties. 
In this section, we illustrate the shuffle description of the elliptic quantum group 
E,,n(sl2). Furthermore, we show (3) applied to special cases coincides with the 
shuffle formulas in [8, Proposition 3.6] and [13, Definition 5.9]. 

When g = Sly, the corresponding quiver is CQ, with one vertex, no arrows. Let 
SHw=0 := On ¢x Ey = (Bren Op) & Org, . For any local sections f € Opa, g € 
O gm, by (3) and (6), we have 


frg= Do offs I] ae 


U(X; — x 
o€Sh(n,m) l<s<n,n+l<t<n+m ( * 1) 


where fxg € Opin+m) and Sh(n, m) consists of (n, m)-shuffles, i.e., permutations of 
{1,--- ,1-+m} that preserve the relative order of {1,--- ,m} and {n+1,---,n+m}. 
The elliptic quantum group E , (Sz) is a subalgebra of (SHy—o, *). 

In the following examples we consider SH with general w. 


Example 6 Assume the quiver is O, with one vertex, no arrows. Let vy = k’, v2 = 
k"", and wy = n', wz =n". Choose t, = h, and t2 = 0. Applying the formula (6) to 
this case, we have 


fac +k" n/n" 


KIRK" kon'tn" nl 4K 
got I] era at” I [[ eG -4s+M]] [] °@:-<s), 
s=1t=k’+ Or — A s=1t=n'+1 s=lt=k'+1 


where 1} (z) is the odd Jacobi theta function, normalized such that 9’(0) = 1. This 
is exactly the same formula as [8, Proposition 3.6]. 


Example 7 When g = sly, the corresponding quiver is 


O—O—+-— 0, 


with N — | vertices, and N — 2 arrows. Consider the framed quiver 


CO 


| 
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Label the vertices of Q° by {1,2,---,N —1,N}. Let va = (v(?)21,...y-1, 


v2 = OP Mato Na be a pair of dimension vectors, and take the framing to 

be wy = (0,---,0,”), W2 = (0,---,0,m). To simplify the notations, let 

i = n, vy) = = m, and denote the variables zw, by ae ae ce ys and Zw, 
sta Uy 


by a iu) Applying the formula (6) to this case, we then have 


facys+v2,w1+we 


CMO) 
_ (11 2 oO =a 4 +) 
os i i 
I=1_ \s=11=1 e030") 


vw y fae vo vy? 


T i pAlD 40 4 1) i [len O_ WO, \) 


s=1 t=1 t=1 s=1 
Following [13, $5 (5.3)], we denote by Hyiw(Ay, Zw) the following element 


N=1 y® yd 


Ay+w(ay, Zw) = I] I] I] salt) _ ae +11). 


1=1 s=1 t=1 


A, +v5+wy+ (A UA Uzws) 
Define Heroce = (attain Oy tyz:tm Uwe) We have 
Ays404 vy; Zw)" Ayy +0 (yy; Zw)" 


HAeross 
1 v 1 I+1 
N-1 vy (+1) ult ) vw? 
” ie 
(+1 '¢d 141 nd 
= TT (IT 1] 90? -2 +9 TT Toast"? 22 410) 
t=1 s=1 t=1 t=1 s=1 
N-1 vl a ie vit) v+uh () 


I+1 i 
(TI I] eof-.40 7] IT vast? 9 +1). 


I=1 Ss=1 joy +044 tI sy 44 


(8) 
Divide facy,+v>,wy+w2 by Across, we obtain 


() vw 


- i 1 
facy,++¥s.w1-+Wwe (yt my 7 ‘(TIT on — 7 4 + 2) 
Heross i=1_ Ss=11=1 POP <a) 


aft) yf @ 


Tl 7 Oy aoe Ocoee ) 


TH 7 
f= s= eae a re 
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This coincides with the formula in [13, Definition 5.9] when t; = —1, fo = 0. 
In other words, consider the map 


BD shw> GB  (SHywioc. given by 


veN! weN/ veN! weN/ 


f Oy, Zw) 


fies 
bie Ayswlay, Zw) 


It intertwines the shuffle product (3) (with theta-functions), and the shuffle product 
of [13, Definition 5.9]. 


Remark 4 


1. In the work of [8] in the slo case, and [13] in the sly case, the shuffle 
formulas are used to obtain an inductive formula for the elliptic weight functions. 
Proposition | provides a way to define the elliptic weight functions associated to 
a general symmetric Kac-Moody Lie algebra g. 

2. In the above cases for sly and the special framing, the elliptic weight functions 
are expected to be related to the elliptic stable basis in [1] (see also [8, 13]). 
Therefore, it is reasonable to expect an expression of the elliptic stable basis in 
terms of the shuffle product (3) (with theta-functions) for general quiver varieties. 


3.4 Drinfeld Currents 


We now explain the Drinfeld currents in the elliptic case. The choice of an elliptic 
curve EF gives rise to the dynamical elliptic quantum group. 

Let M2 be the open moduli space of 2 pointed genus 1 curves. We write a point 
in M, 9 as (Er, A), where E, = C/Z@tZ, and d gives a line bundle Ly € Pic(E£,). 
Let E be the universal curve on M2. There is a Poincare line bundle L on E, which 
has a natural rational section 


O(z +A) 
D(z)V(A) 


where z is the coordinate of E,, and v(z) is the Jacobi odd theta function, 
normalized such that 3’(0) = 1. 

We can twist the equivariant elliptic cohomology Ellg by the Poincare line bundle 
L. For each simple root ex, after twisting, we have SHE. = O rey) @L. A basis of the 


meromorphic sections Tra (SHY ) consists of {e? (24) = e (Ress) | , 
Le 
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Consider A = (Ax) kez, and let 


(ze + AR + U) 


—— L 
‘api oe ee 


CO 
a= >) eae = 
i=0 


Similarly, we define series X, (u, 4), ®x(u). The series a he: A), X, (u, A), and 
@,(u) satisfy the relations of the elliptic quantum group of Gautam-Toledano 
Laredo [9]. 


4 Relation with the Affine Grassmannian 


In this section, we explain one unexpected feature of the sheafified elliptic quantum 
group, namely, its relation with the global loop Grassmannians over an elliptic curve 
E. We assume the quiver Q=(I, H) is of type ADE in this section. 

We collect some facts from [19, 20]. Let C be a curve. An J-colored local space 
Z over C, defined in [19, Section 2], see also [20, Section 4.1], is a space Z > 
Hc x 1 over the /-colored Hilbert scheme of points of C, together with a consistent 
system of isomorphisms 


Zp xX Zp > Zpup, for D, D'! € Hcx; with DN D' = @. 


Similarly, for a coherent sheaf F on Hc x1, a locality structure of F on Hc x] is a 
compatible system of identifications 


tp.p': Fp X Fp = Foup, for D, D'! € Hcy; with DN D' =. 


An example of local spaces is the zastava space Z — Hex), recollected in 
detail in [19, Section 3] and [20, Section 4.2.3]. Here C is a smooth curve. In 
Mirkovié (personal communication), a modular description of Z is given along the 
lines of Drinfeld’s compactification. Let G be the semisimple simply-connected 
group associated to Q, with the choice of opposite Borel subgroups B = TN 
and BT = TN with the joint Cartan subgroup T. Consider the Drinfeld’s 
compactification Vg of a point: 


Ve = G\L(G/N*)*# x (G/N~)*"Y/T. 


The zastava space Z — Hc, for G is defined as the moduli of generic maps from 
C to Vg. Gluing the zastava spaces, one get a loop Grassmannian Gr as a local space 
over Hc x,, which is a refined version of the Beilinson-Drinfeld Grassmannian, see 
[19, Section 3] and [20, Section 4.2.3]. 

Fix a point c € C, and a dimension vector v € N’, let [c] € C™ © Hex, be 
the special divisor supported on {c}. The fiber Z,] is the Mirkovi¢-Vilonen scheme 
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associated to the root vector v, i.e., the intersection of closures of certain semi- 
infinite orbits in Gcqz)/Gcyzq [19, Section 3]. 

The maximal torus T C G acts on Z. There is a certain component in a torus- 
fixed loci (2)? , which gives a section H¢cx7 C (2). We denote this component by 
(Z)"°. The tautological line bundle Og; (1)| gre has a natural locality structure, and 
is described in [19, Theorem 3.1]. 

We now take the curve C to be the elliptic curve E. Let Gr > He x, be the global 
loop Grassmannian over Hz. The following theorem relies on the description of 
the local line bundle on He x in [19] and [20, Section 4.2.1]. 


Theorem C ([26] Yang-Zhao) 


1. The classical limit P" (Ell, Q)|n=0 is isomorphic to Og,(—1)| gre as sheaves on 
HeExl.- 

2. The Hall multiplication * on P®"(Ell, Q)|n=o is equivalent to the locality 
structure on Og;,(1)| gre- 


Remark 5 


1. Theorem C is true when the curve E is replaced by C (and C*), while the 
corresponding cohomological Hall algebra is modified to PSP®(CH, Q)|n—0 
(and P*P'(K, Q)|n=o respectively). The sheaf PSP*(Ell, Q) deforms the local 
line bundle Og,(—1)| gro. In the classification of local line bundles in [19], 
Og, (1)| gro is characterized by certain diagonal divisor of Hex; [20, Section 


4.2.1]. As a consequence of Theorem C, the shuffle formula of PP) (Ell, QO) gives 
the f-shifting of the diagonal divisor of Hex; that appears in Og, (1)| gro. 

2. When the base is Hc x7, and cohomology theory is the Borel-Moore homology 
Hgm, Theorem C (1) has a similar flavour as [4, Theorem 3.1]. Here, we only 
consider P°?"(Hpm, Q)|n—o and Ogr(—1)| gre as sheaves of abelian groups. By 
Theorem A, (P’?*(Hgm, Q), *) is isomorphic to the positive part of the Yangian, 


which is in turn related to Og, (—1)| ZU) by Theorem C. 
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